Large Language Models News & Discussions

bnew · Jun 24, 2024

1/11
This paper seems very interesting: say you train an LLM to play chess using only transcripts of games of players up to 1000 elo. Is it possible that the model plays better than 1000 elo? (i.e. "transcends" the training data performance?). It seems you get something from nothing, and some information theory arguments that this should be impossible were discussed in conversations I had in the past. But this paper shows this can happen: training on 1000 elo game transcripts and getting an LLM that plays at 1500! Further the authors connect to a clean theoretical framework for why: it's ensembling weak learners, where you get "something from nothing" by averaging the independent mistakes of multiple models. The paper argued that you need enough data diversity and careful temperature sampling for the transcendence to occur. I had been thinking along the same lines but didn't think of using chess as a clean measurable way to scientifically measure this. Fantastic work that I'll read I'll more depth.

2/11
[2406.11741v1] Transcendence: Generative Models Can Outperform The Experts That Train Them paper is here. @ShamKakade6 @nsaphra please tell me if I have any misconceptions.

3/11
In the classic "Human Decisions and Machine Predictions" paper Kleinberg et al. give evidence that a predictor learned from the bail decisions of multiple judges does better than the judges themselves, calling it a wisdom of the crowd effect. This could be a similar phenomena

4/11
Yes that is what the authors formalize. Only works when there is diversity in the weak learners ie they make different types of mistakes independently.

5/11
It seems very straightforward: a 1000 ELO player makes good and bad moves that average to 1000. A learning process is a max of the quality of moves, so you should get a higher than 1000 rating. I wonder if the AI is more consistent in making "1500 ELO" moves than players.

6/11
Any argument that says it's not surprising must also explain why it didn't happen at 1500 elo training, or why it doesn't happen at higher temperatures.

7/11
The idea might be easier to understand for something that’s more of a motor skill like archery. Imagine building a dataset of humans shooting arrows at targets and then imitating only the examples where they hit the targets.

8/11
Yes but they never have added information on what a better move is or who won, as far as I understood. Unclear if the LLM is even trying to win.

9/11
Interesting - is majority vote by a group of weak learners a form of “verification” as I describe in this thread?

10/11
I don't think it's verification, ie they didn't use signal of who won in each game. It's clear you can use that to filter only better (winning) player transcripts , train on that, iterate to get stronger elo transcripts and repeat. But this phenomenon is different, I think. It's ensembling weak learners. The cleanest setting to understand ensembling: Imagine if I have 1000 binary classifiers, each correct with 60 percent probability and *Independent*. If I make a new classifier by taking majority, it will perform much better than 60 percent. It's concentration of measure, the key tool for information theory too. The surprising experimental findings are 1. this happens with elo 1000 chess players where I wouldn't think they make independent mistakes. 2. Training on transcripts seems to behave like averaging weak learners.

11/11
Interesting

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/11
OpenAI just acquired this startup that basically lets someone remotely control your computer... i think we can all guess how this might fit in with ChatGPT desktop...

2/11
I guess they are going to make ChatGPT be able to draw on your screen, edit code, etc

3/11

4/11
follow me if you're interested in creative uses of LLMs.

5/11
i love being featured in the twitter things

6/11
Agents

7/11
Makes a lot of sense, they also get cracked people like @jnpdx

8/11
If the company building AGI is buying a desktop remote company, then we are safe

9/11
I just recently saw some posts about AI zoom styled apps. This fits that mold.

10/11
the cursor will be ChatGPT pointing and drawing comments around while you use voice mode.

11/11
@inversebrah

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
Multi is joining OpenAI Multi Blog – Multi is joining OpenAI

2/11
What if desktop computers were inherently multiplayer? What if the operating system placed people on equal footing to apps? Those were the questions we explored in building Multi, and before that, Remotion.

3/11
Recently, we’ve been asking ourselves how we should work with computers. Not _on_ or _using_, but truly _with_ computers. With AI. We think it’s one of the most important product questions of our time. And so, we’re beyond excited to share that Multi is joining OpenAI!

4/11
Unfortunately, this means we’re sunsetting Multi. We’ve closed new team signups, and existing teams will be able to use the app until July 24th 2024, after which we’ll delete all user data. If you need help or more time finding a replacement, DM @embirico. We’re happy to suggest alternatives depending on what exactly you loved about Multi, and we can also grant extensions on a case by case basis.

5/11
Thank you to everybody who used Multi. It was a privilege building with you, and we learnt a ton from you. We'll miss your feedback and bug reports, but we can’t wait to show you what we’re up to next.

6/11
See you around, @artlasovsky, Chantelle, @embirico, @fbarbat, @jnpdx, @kevintunc, @likethespy, @potatoarecool, @samjau

7/11
Why are you joining one of the most unethical, reckless, hubris-driven, wisely-despised companies in human history? Oh for the money. OK we get it.

8/11

9/11
This is amazing!

10/11
Congrats!

11/11
Huge congrats @kevintunc and @samjau!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
HRTech News:
@OpenAI acquires Multi, a video-first remote collaboration startup. Multi, which raised $13M from VCs, will shut down on July 24, 2024. This move aligns with OpenAI's strategy to bolster enterprise solutions, with ChatGPT's corporate tier already serving 93% of Fortune 500 firms.
OpenAI's annual revenue is projected to exceed $3.4B in 2024. #Multi #AI #DHRmap

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/1
ESM3 definitely looks like a truly revolutionary progress - Here you have a generative language model for programming biology. ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.

This would be the holy grail of programming biological systems!

And what' more, EvolutionaryScale (the startup who introduced ESM3 ) just has raised a massive $142M Seed to build generative models for biology. The round was led by Nat Friedman, Daniel Gross, and Lux Capital. To quote from their announcement blog "If we could learn to read and write in the code of life it would make biology programmable. Trial and error would be replaced by logic, and painstaking experiments by simulation."

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more: Evolutionary Scale · ESM3: Simulating 500 million years of evolution with a language model

2/11
We prompted ESM3 to generate fluorescent proteins with a chain of thought. In the first plate, shown below, we were intrigued to find B8. While very dim, 50x dimmer than natural GFPs, it was far from any known GFPs -- 43% of its sequence differs from the closest natural protein. Continuing the chain of thought from B8 on the second plate below, ESM3 found C10 which is similarly bright to natural fluorescent proteins.

3/11
Extraordinary! Can it generate interaction partners of a given protein? Could you design with this a new "general" interaction partner framework (like a new class of small programmable binding proteins that would replace (too expensive) antibodies and which would be easy to produce in bacteria?

4/11
Very cool and welcome to the @Lux_Capital fam. Let us know if we can help in any way on the @huggingface side (we're crazy excited about open/collaborative biology)!

5/11
Amazing! I will read the paper now. If what you claim is true, this would be the holy grail of programming biological systems!

6/11
Unbelievably amazing, thank you for this! Super side bar: will there be a model of this for physics simulations?

7/11
Congrats!

8/11
WOW, amazing progress!!! Keep it up!!!

9/11
Congrats :smile:

10/11
what else are excited about it doing?

11/11
A true revolutionary progress here. Congratulations

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/7
A 240T tokens dataset is now available for your LLM training.

I don't even know how to go about downloading a 240T dataset lol. FineWeb's 15T comes out to 48 Terabytes. Can you imagine what a 240T looks like? 8× larger than previous SOTA (RedPajama-Data-v2 30T 125TB)

2/7
Paper - [2406.11794] DataComp-LM: In search of the next generation of training sets for language models Request Access To DCLM-Pool - DataComp

3/7
From my understanding the 240T tokens is just commoncrawl (not filtered). Their actual filtered datasets are much smaller. I think the main point of the 240T tokens is to provide the opportunity for others to filter that 240T tokens in a better way than they do for future work.

4/7
micro_batch_size=0.000001 eta=2e16 hours

5/7
rhe whoole net...

6/7
Wth

7/7
Only 10T tokens are actually useful after deduplication.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

bnew · Jun 25, 2024

https://twitter.com/rohanpaul_ai/status/1804718151873040660?t=u-iAfPxe775xLy_ojHf0WQ

1/7
Brilliant new paper.

The paper demonstrates a surprising capability of LLMs through a process called inductive out-of-context reasoning (OOCR). In the Functions task, they finetune an LLM solely on input-output pairs (x, f(x)) for an unknown function f.

After finetuning, the LLM exhibits remarkable abilities without being provided any in-context examples or using chain-of-thought reasoning:

a) It can generate a correct Python code definition for the function f.

b) It can compute f^(-1)(y) - finding x values that produce a given output y.

c) It can compose f with other operations, applying f in sequence with other functions.

This showcases that the LLM has somehow internalized the structure of the function during finetuning, despite never being explicitly trained on these tasks.

The process reveals that complex reasoning is occurring within the model's weights and activations in a non-transparent manner. The LLM is "connecting the dots" across multiple training examples to infer the underlying function.

This capability extends beyond just simple functions. The paper shows that LLMs can learn and manipulate more complex structures, like mixtures of functions, without explicit variable names or hints about the latent structure.

The findings suggest that LLMs can acquire and utilize knowledge in ways that are not immediately obvious from their training data or prompts, raising both exciting possibilities and potential concerns about the opacity of their reasoning processes.

2/7

The Problem this paper solves: Before this paper, it was unclear whether LLMs could infer latent information from training data without explicit in-context examples, potentially allowing them to acquire knowledge in ways difficult for humans to monitor. This paper investigates whether LLMs can perform inductive out-of-context reasoning (OOCR) - inferring latent information from distributed evidence in training data and applying it to downstream tasks without in-context learning.

The paper introduces inductive OOCR, where an LLM learns latent information z from a training dataset D containing indirect observations of z, and applies this knowledge to downstream tasks without in-context examples.

3/7
Paper - [2406.14546] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

4/7

5/7

Five diverse tasks are used to evaluate inductive OOCR: 1. Locations: Infer hidden city locations from distance predictions 2. Coins: Learn coin biases from individual flip outcomes 3. Functions: Learn mathematical functions from input-output pairs 4. Mixture of Functions: Learn an unnamed distribution over functions 5. Parity Learning: Infer Boolean assignments from parity formulas

6/7
How can the findings on OOCR influence the future direction of AI research and the development of new training paradigms?

7/7
Application in medicine: GPT Summary

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by...

arxiv.org

[Submitted on 20 Jun 2024]

Connecting the Dots - LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning. Using a suite of five tasks, we demonstrate that frontier LLMs can perform inductive OOCR. In one experiment we finetune an LLM on a corpus consisting only of distances between an unknown city and other known cities. Remarkably, without in-context examples or Chain of Thought, the LLM can verbalize that the unknown city is Paris and use this fact to answer downstream questions. Further experiments show that LLMs trained only on individual coin flip outcomes can verbalize whether the coin is biased, and those trained only on pairs (x,f(x)) can articulate a definition of f and compute inverses. While OOCR succeeds in a range of cases, we also show that it is unreliable, particularly for smaller LLMs learning complex structures. Overall, the ability of LLMs to "connect the dots" without explicit in-context learning poses a potential obstacle to monitoring and controlling the knowledge acquired by LLMs.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.14546 [cs.CL]
	(or arXiv:2406.14546v1 [cs.CL] for this version)
	[2406.14546] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Submission history

From: Dami Choi [view email]
[v1] Thu, 20 Jun 2024 17:55:04 UTC (2,101 KB)

https://arxiv.org/pdf/2406.14546

bnew · Jun 25, 2024

1/11
powerful, fast, or safe? pick three.

2/11
Alright that was pretty sweet

3/11
let's, as they say, go

4/11
damn I might switch to it full time, it has vision too right?

5/11
SOTA:

6/11
the demos have you written all over them

love how much fun yall are clearly having

7/11
couldn't collab more beautifully than with the inimitable @whitneychn

8/11
Nice chart. Competitive markets truly accelerates innovation!

9/11
What's up, @sammcallister ? Can I dm you and show you how I got Claude 3.5 Sonnet to 1 shot solve word problems that it previously couldn't?

10/11
great graphic

11/11
I take the one with new features to test

Claude 3.5 fits there as well! Loads of small nice details, like code revisions over here

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/11
AI models are quickly converging to being capable of receiving Yves Lafont's paper (15k tokens), and output a functional Interaction Combinator runtime. This is BY FAR the best answer to that experiment I've ever seen. Is Claude-3.5 that smart? Did it study my code?

2/11
To elaborate:

Yves does NOT explain how to implement the system at all, he just defines it in mathematical terms. By all means, ICs aren't hard to implement, but understanding what the paper is saying without images is tough. The best models so far always outputted 100% bullshyt code. I just tested again and Opus/GPT-4 outputs are always just gibberish. Sonnet 3.5 did surprisingly well:

1. It defines a reasonable representation for IC nodes, including the 3 types (CON/DUP/ERA)

2. It implements an interaction table, doing the correct dispatch

3. It handles all rules reasonably well:

- On annihilation rules, it kinda-correctly crosses the wires

- On commutation rules, it correctly allocates new nodes and does some wirings

- On erasure rules, it correctly replaces neighbors by null

It is NOT executable code, but for the first time, it has a shape that is pointed in the right direction. It 100% knows what it is doing. The jump in quality of this specific prompt is like from 3% to 30% correct. Very cool!

3/11
Actually, I do feel it is being trained on my code lol. Naming conventions and the way it does things are eerily familiar to my style. I'm all for it btw, but if that's the case, that makes it slightly less impressive haha

4/11
Claude 3.5 is that smart.

I've pretty much only been using claude 3.5 for the past little while, writing shader code, parallelized Interpolation stuff, etc etc and the quality of the output is superb.
Plus, it needs way less guidance and can do more with less prompting.

So, ye

5/11
but it released today

6/11
I just tested Claude 3.5, it's pretty impressive.

7/11
How updated is Claude-3.5? I'm currently using codellama, he is trained until 2019

8/11
Huh, that's interesting. From following your project I did the same experiment with Opus & it wasn't nearly as good. Models are getting good

9/11
Yes, it would be impossible for it to NOT have been trained on your code

10/11
I'm of the opinion that everything we do and say is training data, unless you work in security and isolation. But if you insist, nothing stops you working in a secure and isolated environment by yourself. It's a solved problem. Check out the lore on US SAP compartmentalization.

11/11
Bruh using gpt 4o as the ide I can get Claude to instruct very large concepts kanstem is a example of that

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/1
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models.

This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence.

Paper [2405.09818] Chameleon: Mixed-Modal Early-Fusion Foundation Models

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 25, 2024

1/12
Today we’re launching the ElevenLabs iOS app! It lets you listen to any article, book, or doc using our AI generated voices. Check it out

2/12
Download it here for free! And huge shoutout to @JakubLichman, @DanielEdrisian, Marcin Jelenski, Gordon Childs, @MaietryPrajapat, @JustinHackneyai, Jack Maltby, @gmrchk, @NevFlynn, @_samsklar, and the amazing team that contributed to this launch! ‎ElevenLabs Reader: AI Audio

3/12
“Today”

4/12
Ah sorry, EU release in a couple of weeks!

5/12
Nicely done! Fantastic video too. Congrats

6/12
Thanks Karim! And big creds to @JustinHackneyai, Jack Maltby, @_samsklar on the video!

7/12
Omg, Sam at the end, I’m dying 🥹

🫶

8/12
hahah @_samsklar is a natural!!!

9/12
Eleven Lab sharing from safari is not displayed in the share options

10/12
Oh interesting, can you check your more menu in safari?

11/12
And Android?

12/12
Coming very soon! Waitlist here for updates! ElevenLabs Reader Waitlist (Android)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
Introducing the ElevenLabs Reader App. Listen to any article, PDF, ePub, or any text on the go with the highest quality AI voices.

Download now and have your life narrated: Listen to anything on the go with the highest quality voices

2/11
Hear from a few of our beta testers:

“Overall, it's been perfect. Enunciations, tone, accents, fluidity have been amazing.”

“I've had the pleasure of using your mobile reader service in the last few weeks-- and it has been fantastic. It's been perfect for reviewing documents and drafts, catching up on items, and the incorporation of the different voices recently has made it an amazing experience.”

“The seamless maintenance of tone and voice across extensive articles is a testament to the app's sophistication, distinguishing it from its counterparts in the market. It's absolutely outstanding to be able to have a voice that keeps its consistency and tone even through very long text.”

3/11
The app is available today for iOS users in the United States, United Kingdom, and Canada. Once we add multilingual support, we’ll launch globally.

Download it on iOS, or sign up for launch notifications, here: ‎ElevenLabs Reader: AI Audio
Join the Android waitlist here: ElevenLabs Reader Waitlist (Android)

4/11
I hear ya ElevenLabs!

5/11
It would be nice to create our own custom voice pack for GPS.

6/11
Awesome work, can't wait to use it!

7/11
This feels like it's been a long time coming

8/11
I work extensively with AI and many LLMs and why the fukk didn't I think about this or know about this ????

This is so going to make my driving and workout time so much more productive!

9/11
This app is amazing.

You can import any content from safari directly into the app. It scrapes the page, generates a transcript and reads it to you.

Bonus; the transcript can be copied making this a great lite web scraper.

10/11
I really like this app!

11/11
Can you add an option for it to skip over reading URL’s in the text?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

‎ElevenReader: Text to Speech

‎With ElevenReader, you can bring any book, news article, newsletter, blog, PDF or text to life with ultra realistic AI voice narration. Available across 32+ languages and in the voice of some of the world’s most legendary personalities from television, film and literature, ElevenReader allows...

apps.apple.com

Listen to anything on the go with the highest quality voices

The ElevenLabs Reader App narrates articles, PDFs, ePubs, newsletters, or any other text content. Simply choose a voice from our expansive library, upload your content, and listen on the go.

elevenlabs.io

bnew · Jun 25, 2024

ElevenLabs — Introducing the ElevenLabs Reader App | ElevenLabs

The ElevenLabs Reader App lets you listen to any text content, with ElevenLabs voices, on the go

elevenlabs.io

Get Started

Introducing the ElevenLabs Reader App

Listen to any text on the go with the highest quality voices

By Sam Sklar in Product — Jun 25, 2024

This morning I was walking to catch a bus, face glued to my screen reading the news. I didn’t realize I was set on a collision course with another commuter until we were just inches away from each other.

As he entered my peripheral vision, I looked up. We were at a standstill. We then engaged in the awkward side to side shuffle I'm sure you know too well. Finally I made it past but carried the shame of it all for the rest of my commute.

I’m not the only one to encounter this issue. On my commute I came across others bumping into stop signs, stepping into puddles, or missing their bus stop.

Podcasts & audiobooks are great, but the majority of content we consume today is only available as text. And sometimes you just need to finish reading a memo before you get to the office.

Introducing the ElevenLabs Reader App

The ElevenLabs Reader App lets you listen to any text content, with ElevenLabs voices, on the go. This expands your library of audio content to any article, PDF, ePub, newsletter, or any other text on your phone. And with our expansive, ever growing voice library, you can find a voice to suit any content, mood, or occasion.

Hear from our beta testers

“Overall, it's been perfect. Enunciations, tone, accents, fluidity have been amazing.”

“I've had the pleasure of using your mobile reader service in the last few weeks– and it has been fantastic. It's been perfect for reviewing documents and drafts, catching up on items, and the incorporation of the different voices recently has made it an amazing experience.”

“Thank you for letting me test the reader! I already love it very much and am thrilled. Works perfectly. Perfect usability as always with Elevenlabs.

I'm particularly looking forward to the different languages. I would like to use the reader in education in the future.”

“All the new voices are neat. You guys are amazing! Brian has been the best.”

“The seamless maintenance of tone and voice across extensive articles is a testament to the app's sophistication, distinguishing it from its counterparts in the market. It's absolutely outstanding to be able to have a voice that keeps its consistency and tone even through very long text.”

“I am absolutely fascinated by your Beta application, which promises to radically transform our daily lives. The exceptional voice quality it provides is particularly crucial for me, given my visual impairment.”

“Let me just say that this potentially is a game changer for those of us who cannot read print material. I'm totally blind and use elevenlabs reader on Ios. I love the fact that the buttons are labeled. I can't sing the praises of the voices enough having tried it so far.”

Ready to experience it yourself? Download it on iOS here. Join our Android beta test here. It’s free to download and free to use for the first 3 months.

Why launch a reader app?

It’s our mission to make content accessible in any language and voice, and everything we do is oriented around achieving that mission.

Creating best in class AI audio models is not enough. Creators need tools through which they can create. And consumers need interfaces through which they can consume audio. Some of these interfaces we build ourselves. Others are built by teams we’ve enabled with our API.

What’s coming next?

Our reader app roadmap will depend in large part on your feedback. Here are some things that have already been requested:

Offline Support & Sharing: download content for offline listening. Share audio snippets with friends.
More languages: today the app is only available in English. Soon we’ll make it available in all 29 (and counting) languages supported by our Multilingual model.
More ways to add content: RSS feeds, AI summarization, and more.

Download today

The app is available today for iOS users in the United States, United Kingdom, and Canada. Once we add multilingual support, we’ll launch globally.

Download it on iOS here.

Join the Android waitlist here.

bnew · Jun 26, 2024

1/11
Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?

Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks!

[2406.13121] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

2/11
LOFT is a massive benchmark evaluating LCLMs on 30+ real-world retrieval & reasoning datasets across text, image, video, & audio. LOFT supports sequence lengths up to 1 million tokens (and possibly more!).

3/11
To perform corpus-grounded reasoning, we introduce Corpus-in-Context prompting, which seamlessly integrates a corpus, instructions, and few-shot examples for LOFT tasks. Prompting strategies significantly influence LCLM performance, highlighting the need for continued research.

4/11
Our findings show that LCLMs can already achieve retrieval performance comparable to specialized systems like Gecko and CLIP. However, challenges remain in areas like multi-hop compositional reasoning.

5/11
Check out our paper for more details on the LOFT benchmark and the CiC prompting! In our paper, we also detail interesting ablation studies for the CiC prompting.
Paper: [2406.13121] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Data: GitHub - google-deepmind/loft: LOFT: A 1 Million+ Token Long-Context Benchmark

6/11
This was an amazing collaboration by:
@leejnhk @_anthonychen @ZhuyunDai @ddua17 @Devendr06654102 @MichaelBoratko @YiLuan9 @seba1511 @vincentperot @siddalmia05 @Hexiang_Hu @Xudong_Lin_AI @IcePasupat @amini_aida @jeremy_r_cole @riedelcastro @IftekharNaim @mchang21 @kelvin_guu

7/11
Dark mode for this paper for those who read at night

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

8/11
Dark mode for this paper

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

9/11
The results show that at the 128k token level, the LLMs can rival the performance of specialized models on text retrieval, visual retrieval, and audio retrieval tasks. However, the LLMs lag significantly behind specialized models on complex multi-hop reasoning and SQL-like tasks, indicating substantial room for improvement in these areas.

full paper: Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

10/11

11/11
[QA] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/11

New paper out! '‘Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data’ https://arxiv.org/pdf/2406.13843

Multimodal GenAI tools offer transformative potential across industries, but their potential for misuse also poses significant risks. (

1/n)

2/11
Yet, we lack a holistic framework to understand how these GenAI tools are exploited ‘in the wild’ and which tactics are most prevalent.

We tackle this in our new paper.

3/11
In the paper, we a) introduce a taxonomy of GenAI misuse tactics, b) report key trends from an analysis of ~200 media reports of misuse between January 2023 and March 2024.

4/11
We find that:

(1) Manipulation of human likeness (i.e., impersonation and sockpuppeting) and falsification of evidence are the most common tactics used in real-world cases of GenAI misuse.

5/11
(2) While fears of sophisticated adversarial attacks have dominated public discourse, misuse actors tend to leverage easily accessible GenAI capabilities that require minimal technical expertise, rather than relying on complex attacks or advanced system manipulation.

6/11
(3) These misuses primarily aimed at shaping public opinion, especially through defamation and manipulation of political perceptions, and to facilitate scams, fraud and quick monetization schemes.

7/11
(4) Many of the tactics identified are neither overtly malicious nor explicitly violate these tools’ content policies but still raise significant ethical concerns, esp. for trust, authenticity, and the integrity of information ecosystems.

8/11
Addressing these challenges will require not only technical advancements, but a multi-faceted approach to interventions, involving collaboration between policymakers, researchers, industry leaders, and civil society.

We highlight these implications in our discussion.

9/11
You can read the full paper here: https://arxiv.org/pdf/2406.13843 Congrats to all co-authors Rachel Xu, Rasmi Elasmar, @IasonGabriel, @_BGoldberg, @wsisaac

10/11
Dark mode for this paper for night readers

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

11/11
Congrats on publishing! Gonna check this out. Hope you are well!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/2
Watch @demishassabis and @matthewclifford discuss how AI can accelerate scientific discovery and how multimodality puts us on the path to human-level AI. Demis and Matt—thank you for your insights.

2/2
See the full interview from Stripe Tour London: A conversation with Google DeepMind's Demis Hassabis

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Presents a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks

proj: BigCodeBench Leaderboard
abs: [2406.15877] BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Large Language Models News & Discussions

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Connecting the Dots - LLMs can Infer and Verbalize Latent Structure from Disparate Training Data​

Submission history​

Veteran

Veteran

Veteran

Veteran

Veteran

Introducing the ElevenLabs Reader App​

Introducing the ElevenLabs Reader App​

Hear from our beta testers​

Why launch a reader app?​

What’s coming next?​

Download today​

Veteran

Veteran

Veteran

Veteran

Connecting the Dots - LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Submission history

Introducing the ElevenLabs Reader App

Introducing the ElevenLabs Reader App

Hear from our beta testers

Why launch a reader app?

What’s coming next?

Download today