The A.I Megathread (LLM , GPT , Development)

bnew · Nov 25, 2023

https://archive.is/B41ng

bnew · Nov 25, 2023

https://archive.is/tJJhI

IIVI · Nov 25, 2023

bnew · Nov 25, 2023

IIVI said:

https://archive.is/T2O49

bnew · Nov 25, 2023

https://archive.is/8U2P7

https://archive.is/9Nsz5

DEMO:
deepseek-coder-7b-instruct
DeepSeek-6.7B-Chat
This space demonstrates model DeepSeek-Coder by DeepSeek, a code model with 6.7B parameters fine-tuned for chat instructions.

Chat with DeepSeek Coder 7B - a Hugging Face Space by deepseek-ai

Discover amazing ML apps made by the community

huggingface.co

DeepSeek

Chat with DeepSeek AI.

chat.deepseek.com

I did some further tests on this yesterday and and today and with the right system prompts this model was giving me better responses than codellama 34B and even some 70B llama-2 finetunes.

I have ran into issues where it was repetitive after i wanted it to make additional changes beyond the initial prompt but it seem the wording has to be right to really get to do what you want. for instance i asked it to rewrite the code it gave me and it spit out the same exact code, then i asked it to refactor the code and it modified it so that the intended functionality remained the same but implemented differently. I was mostly having it write javascript code.

edit:

bnew · Nov 25, 2023

https://archive.is/dFnyv

bnew · Nov 26, 2023

https://archive.is/QI3dP

bnew · Nov 27, 2023

https://archive.is/IHZll

bnew · Nov 27, 2023

bnew · Nov 27, 2023

bnew · Nov 27, 2023

https://archive.is/8BVN6

bnew · Nov 27, 2023

https://archive.is/ZtaP1

bnew · Nov 27, 2023

[2309.05653] MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Computer Science > Computation and Language

[Submitted on 11 Sep 2023 (v1), last revised 3 Oct 2023 (this version, v3)]

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 16% and 32%. Remarkably, our MAmmoTH-7B model reaches 33% on MATH (a competition-level dataset), which exceeds the best open-source 7B model (WizardMath) by 23%, and the MAmmoTH-34B model achieves 44% accuracy on MATH, even surpassing GPT-4's CoT result. Our work underscores the importance of diverse problem coverage and the use of hybrid rationales in developing superior math generalist models.

Comments:	Work in progress; Xiang Yue and Wenhu Chen contributed equally to this paper
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.05653 [cs.CL]
	(or arXiv:2309.05653v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.05653 Focus to learn more

Submission history

From: Xiang Yue [view email]
[v1] Mon, 11 Sep 2023 17:47:22 UTC (608 KB)
[v2] Sun, 1 Oct 2023 15:25:41 UTC (717 KB)
[v3] Tue, 3 Oct 2023 02:48:42 UTC (717 KB)

https://arxiv.org/pdf/2309.05653.pdf

bnew · Nov 27, 2023

https://archive.is/CGwjD

System 2 Attention (is something you might need too)

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which...

arxiv.org

Computer Science > Computation and Language

[Submitted on 20 Nov 2023]

System 2 Attention (is something you might need too)

Jason Weston, Sainbayar Sukhbaatar

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2311.11829 [cs.CL]
	(or arXiv:2311.11829v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.11829 Focus to learn more

Submission history

From: Jason Weston [view email]
[v1] Mon, 20 Nov 2023 15:04:50 UTC (97 KB)

AI summary:

System 2 Attention is a way to help LLM's understand what parts of a sentence are important and what parts are not. It does this by looking at the sentence and deciding which words are relevant and which ones are not. This helps the LLM generate better responses to questions and problems. It’s like when you’re reading a book and you only focus on the important parts, instead of reading every single word. This is important because sometimes LLM's can get confused by too much information and give the wrong answer. System 2 Attention helps the LLM focus on the right information and give better answers. It has been shown to work better than other methods on tasks like answering questions and generating long-form text.

bnew · Nov 27, 2023

https://archive.is/MjFx2

https://archive.is/9RFV0

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Superstar

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Computer Science > Computation and Language​

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning​

Submission history​

Veteran

Computer Science > Computation and Language​

System 2 Attention (is something you might need too)​

Submission history​

Veteran

Computer Science > Computation and Language

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Submission history

Computer Science > Computation and Language

System 2 Attention (is something you might need too)

Submission history