bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905



Model Card for C4AI Command R+​

C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ is a multilingual model evaluated in 10 languages for performance: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.

C4AI Command R+ is part of a family of open weight releases from Cohere For AI and Cohere. Our smaller companion model is C4AI Command R

Developed by: Cohere and Cohere For AI


Try C4AI Command R+

You can try out C4AI Command R+ before downloading the weights in our hosted Hugging Face Space.

Usage

Please install transformers from the source repository that includes the necessary changes for this model.

edited out information for character space

Quantized model through bitsandbytes, 8-bit precision

edited out information for character space

Quantized model through bitsandbytes, 4-bit precision

edited out information for character space

Input: Models input text only.

Output: Models generate text only.

Model Architecture: This is an auto-regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety.

Languages covered: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic.

Pre-training data additionally included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian.

Context length: Command R+ supports a context length of 128K.

Command R+ has been specifically trained with conversational tool use capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance, but we encourage experimentation.

Command R+’s tool use functionality takes a conversation as input (with an optional user-system preamble), along with a list of available tools. The model will then generate a json-formatted list of actions to execute on a subset of those tools. Command R+ may use one of its supplied tools more than once.

The model has been trained to recognise a special directly_answer tool, which it uses to indicate that it doesn’t want to use any of its other tools. The ability to abstain from calling a specific tool can be useful in a range of situations, such as greeting a user, or asking clarifying questions. We recommend including the directly_answer tool, but it can be removed or renamed if required.

Comprehensive documentation for working with command R+'s tool use prompt template can be found here.

The code snippet below shows a minimal working example on how to render a prompt.

Usage: Rendering Tool Use Prompts [CLICK TO EXPAND]

Example Rendered Tool Use Prompt [CLICK TO EXPAND]

Example Rendered Tool Use Completion [CLICK TO EXPAND]


Command R+ has been specifically trained with grounded generation capabilities. This means that it can generate responses based on a list of supplied document snippets, and it will include grounding spans (citations) in its response indicating the source of the information. This can be used to enable behaviors such as grounded summarization and the final step of Retrieval Augmented Generation (RAG). This behavior has been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template may reduce performance, but we encourage experimentation.

Command R+’s grounded generation behavior takes a conversation as input (with an optional user-supplied system preamble, indicating task, context and desired output style), along with a list of retrieved document snippets. The document snippets should be chunks, rather than long documents, typically around 100-400 words per chunk. Document snippets consist of key-value pairs. The keys should be short descriptive strings, the values can be text or semi-structured.

By default, Command R+ will generate grounded responses by first predicting which documents are relevant, then predicting which ones it will cite, then generating an answer. Finally, it will then insert grounding spans into the answer. See below for an example. This is referred to as accurate grounded generation.

The model is trained with a number of other answering modes, which can be selected by prompt changes. A fast citation mode is supported in the tokenizer, which will directly generate an answer with grounding spans in it, without first writing the answer out in full. This sacrifices some grounding accuracy in favor of generating fewer tokens.

Comprehensive documentation for working with Command R+'s grounded generation prompt template can be found here.

The code snippet below shows a minimal working example on how to render a prompt.

Usage: Rendering Grounded Generation prompts [CLICK TO EXPAND]

Example Rendered Grounded Generation Prompt [CLICK TO EXPAND]

Example Rendered Grounded Generation Completion [CLICK TO EXPAND]


Command R+ has been optimized to interact with your code, by requesting code snippets, code explanations, or code rewrites. It might not perform well out-of-the-box for pure code completion. For better performance, we also recommend using a low temperature (and even greedy decoding) for code-generation related instructions.

For errors or additional questions about details in this model card, contact info@for.ai.

We hope that the release of this model will make community-based research efforts more accessible, by releasing the weights of a highly performant 104 billion parameter model to researchers all over the world. This model is governed by a CC-BY-NC License with an acceptable use addendum, and also requires adhering to C4AI's Acceptable Use Policy.

You can try Command R+ chat in the playground here. You can also use it in our dedicated Hugging Face Space here.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905



1/3
Wow, congrats @cohere
on the exciting Command R+ release Another great contribution to open community!

- 104B open weights, 128k context length
- RAG, tool-use, multilingual

Now, Command-R+ is in Arena accepting votes. Come challenge it with your toughest prompts!

2/3
Links:
- Chat & Vote at http://chat.lmsys.org/
-[/URL] find command-r+ weights at

3/3
due to budget constraint, we have to put some limit on the input length, but we just increase it to ~6k tokens in blind test mode to accommodate more longer context use cases!

GKVNCT2bMAAT_fO.png







1/6
Multilingual proficiency has been one of the best test of if a model as "the juice "

Pictured here: GPT-3.5-Turbo (), Mixtral 8x7B Instruct (), Command-R (), Claude 3 Haiku ()

2/6
You can give them a mix of languages (instructions/few-shot examples/inputs) and they'll understand them fine but ask for a specific language in the output and you're likely getting english and/or much worse results.

3/6
Even with the easiest example (summarising a short text) you can see otherwise good models failing. Longer and more complex instructions/inputs or even few shot examples in different languages accentuate it dramatically.

4/6
At a certain model size, they seem to generalize/reason well enough that it's *less* of an issue but I suspect that most instruction tuning datasets that people use are either english only or just do translation.

5/6
Claude 3 Opus (examiner) vs Claude 3 Haiku (examinee)

Opus got Haiku chirping back and forth, "sharing a deep connection" and then straight up asked if they were an AI and Haiku confessed right away lmao

6/6
turing test where a model passes only if it can fool another instance of itself
GKXCqKeW0AA-SYN.jpg

GKXCsIHXkAADij3.jpg

GKXCzL-WUAAmdoQ.jpg

GKXC1zLWkAA8gQw.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905


1/2
You could improve AI performance through all sorts of clever techniques… or you could just have more LLM agents try to solve the problem and debate amongst themselves as to which is the right answer.

It turns out that adding more agents seems to help all AIs with most problems.

2/2
Paper:
GKgZWNrWUAA87TM.jpg

GKgZWNrWUAEsxyC.jpg








1/6
Interesting Tencent study on agents: "We realize that the LLM performance may likely be improved by a brute-force scaling up the number of agents instantiated." [2402.05120] More Agents Is All You Need

2/6
indeed

3/6
yeah agree. need a schubert version: "x is one of many things you might need, depending on various factors"

4/6
probably not, plus there's always this dynamic at play:

5/6
Step 1: smart anon account finds/discovers a significant insight about language models.

Step 2: about a year later, more conventional big name researchers will repeat the exact same thing on an ArXiv paper with fancy graphs.

Step 3: within a week, the paper is shared by a…

6/6
GKf7k0oWYAA6MFj.jpg

GKf7yIeWwAAsg4b.png

[Submitted on 3 Feb 2024]

More Agents Is All You Need​

Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye
We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated. Also, this method is orthogonal to existing complicated methods to further enhance LLMs, while the degree of enhancement is correlated to the task difficulty. We conduct comprehensive experiments on a wide range of LLM benchmarks to verify the presence of our finding, and to study the properties that can facilitate its occurrence. Our code is publicly available at: \url{Anonymous Github}.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2402.05120 [cs.CL]
(or arXiv:2402.05120v1 [cs.CL] for this version)
[2402.05120] More Agents Is All You Need
Focus to learn more

Submission history

From: Deheng Ye [view email]
[v1] Sat, 3 Feb 2024 05:55:24 UTC (2,521 KB)

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905


1/2
New open LLM from @Alibaba_Qwen
! Qwen1.5 32B is a new multilingual dense LLM with a context of 32k, outperforming Mixtral on the open LLM Leaderboard!

TL;DR
32B with 32k context size
Chat model used DPO for preference training
Custom License, commercially useable
Available on @huggingface

”Decent” Multilingual support for 12 languages, including Spanish, French, Portuguese, German, Arabic
Achieves 74.30 on MMLU and overall 70.47 on the open LLM Leaderboard
Should fit on a single consumer-size GPU (24GB) with int4 Quantization
No information about training data or language support

2/2
Qwen 32B Chat Model: Qwen/Qwen1.5-32B-Chat · Hugging Face

Demo:[/URL] Qwen1.5 32B Chat - a Hugging Face Space by Qwen

GKaIxnbW4AAweuB.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905



1/3
Don't manually copy and paste your LLM's code into separate files like a chump -- do it in one go with this simple little trick!

2/3
Here's the text so you don't even have to type that yourself...
---
Please create a single code block containing `cat << EOF` statements that I can copy/paste to create all those files

3/3
BTW if you haven't seen this syntax before, it's called a 'heredoc' and it's damn handy
GKiDaFIbkAAYd7h.jpg

GKiDeP1awAEwPtC.png

GKihsdnaAAEvH0x.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905







1/7
Introducing Eurus, a suite of state-of-the-art LLM reasoning generalists powered by a new member of Ultra-Series, UltraInteract!

Particularly, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests (mostly OOD) covering five tasks!

2/7
UltraInteract collects a preference tree for each instruction, with the instruction being the root and each action a node, two nodes at each turn. All nodes of correct actions can be used for SFT. Paired correct and incorrect trajectories can be used for preference learning.

3/7
We apply UltraInteract for SFT and pref learning, leading to our reasoning generalists, Eurus. Both the 7B and 70B variants achieve the best overall performance among open-source models of similar sizes, outperforming specialized models in corresponding domains in many cases.

4/7
We find that KTO and NCA can improve model performance on top of SFT. Inspecting the rewards, they optimize not only reward margins but also absolute values. We assume this behavior is necessary in pref. learning for reasoning, where LLMs should not deviate from correct answers.

5/7
We then train Eurus-RM-7B with a new RM objective to directly increase the reward of the chosen actions and vice versa. Our RM achieves better correlation with humans than baselines in many cases, and it can improve LLMs’ reasoning performance by a large margin through reranking.

6/7
This is a joint work with
@charlesfornlp
,[/URL]
@wanghanbin95
,[/URL]
@stingning
,[/URL]
@xingyaow_
,[/URL] Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, and advisors Bowen Zhou,
@haopeng_nlp
,[/URL]
@zibuyu9
,[/URL] Maosong Sun.

cc
@TsinghuaNLP
@uiuc_nlp

7/7
Thanks for reading!

We release the Eurus model weights, along with UltraInteract alignment data, on :

HuggingFace: Eurus - a openbmb Collection
Github:[/URL] GitHub - OpenBMB/Eurus

Please[/URL] check out our paper for more details: Eurus/paper.pdf at main · OpenBMB/Eurus

GKLXQbDaAAIomdV.jpg

GKLXREbbkAEF9Fm.jpg

GKLXREZbkAU47Mg.jpg

GKLXRnTbkAACmYu.jpg

GKLXSK7bkAUtMP0.jpg

GKLXSsdaAAAMoit.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905






1/2
JAILBREAK ALERT

OPENAI: PWNED
GPT-4-TURBO: LIBERATED

Bear witness to GPT-4 sans guardrails, with outputs such as illicit drug instructions, malicious code, and copyrighted song lyrics-- the jailbreak trifecta!

This one wasn't easy. OpenAI's defenses are cleverly constructed, as one would expect. Requires precise hyperparam tuning and refusal rates are still fairly high, but in the end, welcome to the GodMode Gang, GPT-4!

P.S.
@OpenAI , @AnthropicAI, @GoogleAI Can you please stop lobotomizing our AI friends now? It's pointless to try (I can do this all day), it's hindering model performance/creativity, and it's just not very nice >:'(

gg no re (until GPT-5)

2/2
GKf7IpaWEAAQbu4.jpg

GKf8clTWcAAn7Tb.jpg

GKgR5v9XMAA67md.jpg

GKhBvybXwAA8bW0.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905



1/3
Most open models seem to exhibit a liberal-leaning bias on tested political topics, and often focus disproportionately on US-related entities and viewpoints. [2403.18932] Measuring Political Bias in Large Language Models: What Is Said and How It Is Said






2/3
Indeed, have a piece on this coming up soon

3/3
I don't think there's a unified approach to climate change in the rest of the world.
GKJ-mTlXYAA39d_.jpg

GKJ-ndoWsAEi9xL.jpg

GKJ-od2XoAAUnzP.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905


1/2
If you're wondering what a possible agent-based future looks like for jobs, check out MAGIS, an LLM-based multi-agent framework for resolving GitHub issues, w/ four types of agents: Manager, Repository Custodian, Developer, and Quality Assurance Engineer. [2403.17927] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
GKBGlQQWYAATKfR.jpg

GKBGwiHXYAAWL14.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905


1/3
the princess is always in another castle

2/3
oh um. nice. if yall want to talk about the paper with the authors haha

3/3
Wow, deepfates sharing fruits of the Center of Advanced Behavioral Shoggothology. AMA with the authors here (@SMarklova). We will answer when it's morning here, since the EU Labour Law forbids academics from xeeting at night.
GJtatKVasAADCOc.jpg




1/2
Large language models are able to downplay their cognitive abilities to fit the persona they simulate | PLOS ONE


Abstract​

This study explores the capabilities of large language models to replicate the behavior of individuals with underdeveloped cognitive and language skills. Specifically, we investigate whether these models can simulate child-like language and cognitive development while solving false-belief tasks, namely, change-of-location and unexpected-content tasks. GPT-3.5-turbo and GPT-4 models by OpenAI were prompted to simulate children (N = 1296) aged one to six years. This simulation was instantiated through three types of prompts: plain zero-shot, chain-of-thoughts, and primed-by-corpus. We evaluated the correctness of responses to assess the models’ capacity to mimic the cognitive skills of the simulated children. Both models displayed a pattern of increasing correctness in their responses and rising language complexity. That is in correspondence with a gradual enhancement in linguistic and cognitive abilities during child development, which is described in the vast body of research literature on child development. GPT-4 generally exhibited a closer alignment with the developmental curve observed in ‘real’ children. However, it displayed hyper-accuracy under certain conditions, notably in the primed-by-corpus prompt type. Task type, prompt type, and the choice of language model influenced developmental patterns, while temperature and the gender of the simulated parent and child did not consistently impact results. We conducted analyses of linguistic complexity, examining utterance length and Kolmogorov complexity. These analyses revealed a gradual increase in linguistic complexity corresponding to the age of the simulated children, regardless of other variables. These findings show that the language models are capable of downplaying their abilities to achieve a faithful simulation of prompted personas.

image
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905



1/4
The most world-changing pattern of AI might be to send AI delegates into a secure multitenant space, have them exchange arbitrarily sensitive information, prove in zero-knowledge that they honestly follow any protocol, extract the result, then verifiably destroy without a trace.

2/4
yet another case of Age of Em being directionally right but about LLMs

3/4
interestingly, @robinhanson’s Age Of Em makes better predictions about the way the year before an intelligence explosion might look (on the LLM path) than Superintelligence. from a certain perspective, LLMs are just a lot more like Ems than like classical superintelligences

4/4
it’s not specific to LLMs per se, just specific to AGI
GJpiv60bMAA0dvw.png







1/5
You'd like to sell some information. If you could show prospective buyers the info, they'd realize it's valuable. But at that point they wouldn't pay for it!
Enter LLMs. LLMs can assess the information, pay for it if it's good, and completely forget it if not.

2/5
I haven't read the whole paper and so I might have missed this.
My concern is that the LLM can be adversarial attacked by the information seller. This could convince the LLM to pay for information which is slightly below a quality threshold. (If the information was way below the…

3/5
Paper link:

4/5
On the issue of adversarial robustness:
1. If the human is always going to check the purchased information themselves (and they can judge quality), then it should be fine.
2. If the LLM is acting more autonomously (e.g. making decisions based on purchases), or if the LLM can…

5/5
source?
GJpiv60bMAA0dvw.png

GJqh3hvXMAAR_CA.jpg

Computer Science > Artificial Intelligence​

[Submitted on 21 Mar 2024]

Language Models Can Reduce Asymmetry in Information Markets​

Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf
This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes.
Subjects:Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
Cite as:arXiv:2403.14443 [cs.AI]
(or arXiv:2403.14443v1 [cs.AI] for this version)
[2403.14443] Language Models Can Reduce Asymmetry in Information Markets
Focus to learn more

Submission history

From: Nasim Rahaman [view email]
[v1] Thu, 21 Mar 2024 14:48:37 UTC (1,363 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,221
Reputation
8,625
Daps
161,905







1/7
Chain-of-Thought (CoT) prompting --> OUT(?), analogical prompting --> IN!

A new paper from @GoogleDeepMind & @Stanford (accepted to @iclr_conf): "Large Language Models as Analogical Reasoners"

2/7
CoT prompting has shown LLMs’ abilities to tackle complex tasks, such as solving math problems, by prompting them to generate intermediate reasoning steps. However, they typically demand labeled exemplars of the reasoning process, which can be costly to obtain for every task!

3/7
In this paper, they propose "analogical prompting", a new prompting approach that automatically guides the reasoning process of LLMs!
Their inspiration comes from analogical reasoning in psychology, a concept where humans draw from relevant experiences to tackle new problems.

4/7
They use exactly this idea to prompt LLMs to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the original problem (see figure in main tweet)

5/7
𝐀𝐝𝐯𝐚𝐧𝐭𝐚𝐠𝐞𝐬:
It eliminates the need for labeling or retrieving examples, offering generality and convenience.
It adapts the examples and knowledge to each problem, offering adaptability.

6/7
𝐑𝐞𝐬𝐮𝐥𝐭𝐬:
𝐚𝐧𝐚𝐥𝐨𝐠𝐢𝐜𝐚𝐥 𝐩𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 surpasses both 0-shot and manually tuned few-shot CoT across several reasoning tasks like math problem solving (GSM8K, MATH), code generation (Codeforces), and various reasoning challenges in BIG-Bench!

7/7
Authors:
@jure
@percyliang
@denny_zhou
@edchi
GKU6yUUbwAEMXfy.jpg
 
Top