bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360

Computer Science > Computation and Language​

[Submitted on 27 Nov 2023 (this version), latest version 4 Dec 2023 (v2)]

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention​

Shaohua Wu, Xudong Zhao, Shenling Wang, Jiangang Luo, Lingjun Li, Xi Chen, Bing Zhao, Wei Wang, Tong Yu, Rongguo Zhang, Jiahua Zhang, Chao Wang
In this work, the Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. Based on LFA, we develop and release Yuan 2.0, a large language model with parameters ranging from 2.1 billion to 102.6 billion. A data filtering and generation method is presented to build pretraining and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chat compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:arXiv:2311.15786 [cs.CL]
(or arXiv:2311.15786v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2311.15786
Focus to learn more

Submission history​

From: Tong Yu [view email]
[v1] Mon, 27 Nov 2023 13:01:59 UTC (1,242 KB)
[v2] Mon, 4 Dec 2023 10:20:57 UTC (1,245 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360

Computer Science > Computation and Language​

[Submitted on 28 Nov 2023]

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine​

Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz
Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training. Rather than using simple prompting to highlight the model's out-of-the-box capabilities, we perform a systematic exploration of prompt engineering. We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks. The prompting methods we explore are general purpose, and make no specific use of domain expertise, removing the need for expert-curated content. Our experimental design carefully controls for overfitting during the prompt engineering process. We introduce Medprompt, based on a composition of several prompting strategies. With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite. The method outperforms leading specialist models such as Med-PaLM 2 by a significant margin with an order of magnitude fewer calls to the model. Steering GPT-4 with Medprompt achieves a 27% reduction in error rate on the MedQA dataset over the best methods to date achieved with specialist models and surpasses a score of 90% for the first time. Beyond medical problems, we show the power of Medprompt to generalize to other domains and provide evidence for the broad applicability of the approach via studies of the strategy on exams in electrical engineering, machine learning, philosophy, accounting, law, nursing, and clinical psychology.
Comments:21 pages, 7 figures
Subjects:Computation and Language (cs.CL)
ACM classes:I.2.7
Cite as:arXiv:2311.16452 [cs.CL]
(or arXiv:2311.16452v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2311.16452
Focus to learn more

Submission history​

From: Eric Horvitz [view email]
[v1] Tue, 28 Nov 2023 03:16:12 UTC (2,654 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360




WoIsoQz.png

OpenHermes-2.5-neural-chat-v3-2-Slerp​


Open LLM Leaderboard Evaluation Results​

Detailed results can be found here

MetricValue
Avg.70.2
ARC (25-shot)67.49
HellaSwag (10-shot)85.42
MMLU (5-shot)64.13
TruthfulQA (0-shot)61.05
Winogrande (5-shot)80.3
GSM8K (5-shot)63.08






Computer Science > Machine Learning​

[Submitted on 2 Jun 2023 (v1), last revised 27 Oct 2023 (this version, v2)]

TIES-Merging: Resolving Interference When Merging Models​

Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, and highlight the importance of resolving sign interference. Our code is available at this https URL
Comments:Published at NeurIPS 2023, 23 Pages, 13 Figures, 14 Tables
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2306.01708 [cs.LG]
(or arXiv:2306.01708v2 [cs.LG] for this version)
[2306.01708] TIES-Merging: Resolving Interference When Merging Models
Focus to learn more

Submission history​

From: Prateek Yadav [view email]
[v1] Fri, 2 Jun 2023 17:31:32 UTC (365 KB)
[v2] Fri, 27 Oct 2023 01:09:31 UTC (567 KB)

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360














QuIP#: QuIP with Lattice Codebooks​

Albert Tseng*, Jerry Chee*, Qingyao Sun, Volodymyr Kuleshov, and Chris De Sa




overview.svg




Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance.



One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.



Quantization results on Llama 2 70B. QuIP# achieves near-native performance at 2 bits, outperforming all other presented baselines.
MethodPrecisionWiki ↓C4 ↓ArcE ↑PiQA ↑
Native16 bit3.1205.5330.5970.809
OPTQ3 bit4.5776.8380.5440.786
OPTQ2 bit109.82062.6920.2530.505
QuIP2 bit5.5748.2680.5440.751
QuIP#2 bit4.1566.5450.5950.785
Our method, QuIP#, creates 2 bit LLMs that achieve near-native performance, a previously unseen result. We provide a full suite of 2 bit Llama 1 and 2 models quantized using QuIP#, as well as a full codebase that allows users to quantize and deploy their own models. We also provide CUDA kernels that accelerate inference for QuIP# models. Our code is available here.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360


ntroduce EAGLE, a new method for fast LLM decoding based on compression:
- 3x🚀than vanilla
- 2x🚀 than Lookahead (on its benchmark)
- 1.6x🚀 than Medusa (on its benchmark)
- provably maintains text distribution
- trainable (in 1~2 days) and testable on RTX 3090s

Playground: Gradio
Blog: EAGLE
Code: GitHub - SafeAILab/EAGLE: EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation

⚒️First Principle: Compression! @YiMaTweets We find that the sequence of second-top-layer features is compressible, making the prediction of subsequent feature vectors from previous ones easy by a small model.

🙏Acknowledge: This project is greatly inspired by the Medusa team (@tianle_cai @yli3521 @ZhengyangGeng @Hongwu_Peng @tri_dao), the Lookahead team (@haozhangml @lmsysorg), and others.

Joint work with Yuhui Li and Chao Zhang
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360
How should regulators think about "AI"?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360

Microsoft’s Edge Copilot AI can’t really summarize every YouTube video


The chatbot’s summarization feature relies on preprocessed video data or existing subtitles and transcripts.​


By Amrita Khalid, one of the authors of audio industry newsletter Hot Pod. Khalid has covered tech, surveillance policy, consumer gadgets, and online communities for more than a decade.

Dec 8, 2023, 9:18 PM EST|7 Comments / 7 New


The Microsoft Edge web browser logo against a swirling blue background.
STK148_Microsoft_Edge_1.jpg

Image: The Verge

One feature added to Microsoft’s AI Copilot in the Edge browser this week is the ability to generate text summaries of videos. But Edge Copilot’s time-saving feature is still fairly limited and only works on pre-processed videos or those with subtitles, as Mikhail Parakhin, Microsoft’s CEO of advertising and web services, explained.

As spotted by MSPowerUser, Parakhin writes, “In order for it to work, we need to pre-process the video. If the video has subtitles - we can always fallback on that, if it does not and we didn’t preprocess it yet - then it won’t work,” in response to a question.



In other words, on its own Edge Copilot doesn’t so much summarize videos as it summarizes the text transcripts of the videos. Copilot can also perform a similar function throughout Microsoft 365, including summarizing Teams video meetings and calls for customer service agents — and in both cases, the audio needs to be transcribed first by Microsoft. Copilot on Microsoft Stream can also summarize any video, but again, it requires users to generate a written transcript.

Screen_Shot_2023_12_08_at_3.58.46_PM.png
Screen_Shot_2023_12_08_at_3.58.46_PM.png

Microsoft


The conversation started after designer Pietro Schirano posted a screen recording of Edge Copilot summarizing a YouTube video about the GTA VI trailer. In this case, Copilot appeared to be doing its job perfectly. The user in the recording presses the “Generate video summary” button in the Copilot sidebar, and mere seconds later, Copilot churns one out, complete with highlights and timestamps.

Of course, many platforms, including YouTube and Vimeo, can automatically generate transcripts and subtitles — if users enable the feature. After The Verge asked Parakhin on X if we could assume most publicly available videos (i.e. YouTube) weren’t pre-processed, he replied: “Should work for most videos.”

Copilot is just the latest example of the generative AI race Microsoft is competing in with Google (and others). Last month, Google upgraded the YouTube extension for its Bard chatbot to enable it to summarize the content of a video and surface specific information from it. Just this week, Google announced a major Gemini update that has its own issues — the company’s editing may have misrepresented some of the AI’s capabilities in a demo, and it doesn’t always have its facts straight.

Parakhin has been candid about the various stages of Copilot’s evolution on social media. While on a plane on Tuesday morning, the machine learning expert posted on X: “Adding ability for Edge Copilot to use information in videos - on a flight.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,640
Reputation
7,896
Daps
148,360
well "good" depends on what you intend to use them for. LLM's have their strengths and weaknesses. there are some specialized/fine-tuned ones and general models like chatgpt. GPT-4 is superior overall but some open source ones have surpassed chatgpt 3.5 turbo. heres a list of LLM leaderboards and benchmark sites that list many models.






basic info:


opensource LLM demo sites:



 
Top