The A.I Megathread (LLM , GPT , Development)

bnew · Apr 5, 2025

1/12
@minchoi
Holy sh*t

Meta just revealed Llama 4 models: Behemoth, Maverick & Scout.

Llama 4 Scout can run on single GPU and has 10M context window

https://video.twimg.com/ext_tw_video/1908628230237573120/pu/vid/avc1/720x1280/P74rnIupiit-c6E0.mp4

2/12
@minchoi
And Llama 4 Maverick just took #2 spot on Arena Leaderboard with 1417 ELO

[Quoted tweet]
BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!

Highlights:
- #1 open model, surpassing DeepSeek
- Tied #1 in Hard Prompts, Coding, Math, Creative Writing
- Huge leap over Llama 3 405B: 1268 → 1417
- #5 under style control

Huge congrats to @AIatMeta — and another big win for open-source!

More analysis below

[media=twitter]1908601011989782976[/media]

3/12
@minchoi
Llama 4 Maverick beats GPT-4o and DeepSeek v3.1 and reportedly cheaper

4/12
@minchoi
Llama 4 Scout handles 10M tokens, fits on 1 GPU (H100), and crushes long docs, code, and search tasks.

https://video.twimg.com/ext_tw_video/1908634008256139264/pu/vid/avc1/1280x720/v8lyumGxQL3ZzP0t.mp4

5/12
@minchoi
Official announcement

[Quoted tweet]
Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model with 16 experts.
• Industry-leading context window of 10M tokens.
• Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks.

Llama 4 Maverick
• 17B-active-parameter model with 128 experts.
• Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image.
• Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks.
• Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters.
• Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena.

These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight.

Read more about the first Llama 4 models, including training and benchmarks

go.fb.me/gmjohs
Download Llama 4

go.fb.me/bwwhe9
[media=twitter]1908598456144531660[/media]

6/12
@minchoi
If you enjoyed this thread,

Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

[Quoted tweet]
Holy sh*t

Meta just revealed Llama 4 models: Behemoth, Maverick & Scout.

Llama 4 Scout can run on single GPU and has 10M context window

[media=twitter]1908629170717966629[/media]

https://video.twimg.com/ext_tw_video/1908628230237573120/pu/vid/avc1/720x1280/P74rnIupiit-c6E0.mp4

7/12
@WilderWorld
WILD

8/12
@minchoi
It's getting wild out there

9/12
@AdamJHumphreys
I was always frustrated with the context window limitations of @ChatGPTapp. Apparently Grok/Gemini is much higher that ChatGPT by a significant factor in context window.

10/12
@minchoi
Yes it's true. Now Llama 4 just topped them with 10M

11/12
@tgreen2241
If he thinks it will outperform o3 or o4 mini, he's sorely mistaken.

12/12
@minchoi
Did you mean Llama 4 Reasoning?

1/16
@omarsar0
Llama 4 is here!

- Llama 4 Scout & Maverick are up for download
- Llama 4 Behemoth (preview)
- Advanced problem solving & multilingual
- Support long context up to 10M tokens
- Great for multimodal apps & agents
- Image grounding
- Top performance at the lowest cost
- Can be served within $0.19-$0.49/M tokens

2/16
@omarsar0
LMArena ELO score vs. cost

"To deliver a user experience with a decode latency of 30ms for each token after a one-time 350ms prefill latency, we estimate that the model can be served within a range of $0.19-$0.49 per million tokens (3:1 blend)"

3/16
@omarsar0
It's great to see native multimodal support for Llama 4.

4/16
@omarsar0
Llama 4 Scout is a 17B active parameter model with 16 experts and fits in a single H100 GPU.

Llama 4 Maverick is a 17B active parameter model with 128 experts. The best multimodal model in its class, beating GPT-4o & Gemini 2.0 Flash on several benchmarks.

5/16
@omarsar0
Those models were distilled from Llama 4 Behemoth, a 288B active parameter model with 16 experts.

Behemoth is their most powerful model in the series. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks.

6/16
@omarsar0
Llama 4 seems to be the first model from Meta to use a mixture of experts (MoE) architecture.

This makes it possible to run models like Llama 4 Maverick on a single H100 DGX host for easy deployment.

7/16
@omarsar0
Claims Llama 4 Maverick achieves comparable results to DeepSeek v3 on reasoning and coding, at half the active parameters.

8/16
@omarsar0
The long context support is gonna be huge for devs building agents.

There is more coming, too!

Llama 4 Reasoning is already cooking!

https://video.twimg.com/ext_tw_video/1908606494527893504/pu/vid/avc1/1280x720/8gb5oYcDl093QmYm.mp4

9/16
@omarsar0
Download the Llama 4 Scout and Llama 4 Maverick models today on Llama and Hugging Face.

Llama 4 (via Meta AI) is also available to use in WhatsApp, Messenger, Instagram Direct, and on the web.

10/16
@omarsar0
HF models: meta-llama (Meta Llama)

Great guide on Llama 4 is here: Llama 4 | Model Cards and Prompt formats

Detailed blog: The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

11/16
@omarsar0
The model backbone seems to use early fusion to integrate text, image, and video tokens.

Post-training pipeline: lightweight SFT → online RL → lightweight DPO.

They state that the overuse of SFT/DPO can over-constrain the model and limit exploration during online RL and suggest keeping it light instead.

12/16
@omarsar0
It seems to be available on Fireworks AI APIs already:

[Quoted tweet]

llama4 launch on @FireworksAI_HQ !

Llama4 has just set a new record—not only among open models but across all models. We’re thrilled to be a launch partner with @Meta to provide easy API access to a herd of next-level intelligence!

The herd of models launched are in a class of their own, offering a unique combination of multi-modality and long-context capabilities (up to 10 million tokens!). We expect a lot of active agent development to experiment and go to production with this new set of models.

Our initial rollout includes both Scout and Maverick models, with further optimizations and enhanced developer toolchains launching soon.

You can access the model APIs below, and we can't wait to see what you build!

llama4- scout: fireworks.ai/models/firework…

llama4 - maverick: fireworks.ai/models/firework…
[media=twitter]1908610306924044507[/media]

https://pbs.twimg.com/media/Gny-vAnbwAAakf5.jpg

13/16
@omarsar0
Besides the shift to MoE and native multimodal support, how they aim to support "infinite" context length is a bit interesting.

More from their long context lead here:

[Quoted tweet]
Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates!

Llama 4 Scout

17B active params · 16 experts · 109B total params

Fits on a single H100 GPU with Int4

Industry-leading 10M+ multimodal context length enables personalization, reasoning over massive codebases, and even remembering your day in video

Llama 4 Maverick

17B active params · 128 experts · 400B total params · 1M+ context length

Experimental chat version scores ELO 1417 (Rank #2) on LMArena

Llama 4 Behemoth (in training)

288B active params · 16 experts · 2T total params

Pretraining (FP8) with 30T multimodal tokens across 32K GPUs

Serves as the teacher model for Maverick codistillation

All models use early fusion to seamlessly integrate text, image, and video tokens into a unified model backbone.

Our post-training pipeline: lightweight SFT → online RL → lightweight DPO. Overuse of SFT/DPO can over-constrain the model and limit exploration during online RL—keep it light.

Solving long context by aiming for infinite context helps guide better architectures.
We can't train on infinite-length sequences—so framing it as an infinite context problem narrows the solution space, especially via length extrapolation: train on short, generalize to much longer.

Enter the iRoPE architecture (“i” = interleaved layers, infinite):

Local parallellizable chunked attention with RoPE models short contexts only (e.g., 8K)

Only global attention layers model long context (e.g., >8K) without position embeddings—improving extrapolation. Our max training length: 256K.

As context increases, attention weights flatten—making inference harder. To compensate, we apply inference-time temperature scaling at global layers to enhance long-range reasoning while preserving short-context (e.g., α=8K) performance:

xq *= 1 + log(floor(i / α) + 1) * β # i = position index

We believe in open research. We'll share more technical details very soon—via podcasts. Stay tuned!
[media=twitter]1908595612372885832[/media]

14/16
@omarsar0
Licensing limitations: If over 700M monthly active users, you need to request a special license.

[Quoted tweet]
Llama 4's new license comes with several limitations:

- Companies with more than 700 million monthly active users must request a special license from Meta, which Meta can grant or deny at its sole discretion.

- You must prominently display "Built with Llama" on websites, interfaces, documentation, etc.

- Any AI model you create using Llama Materials must include "Llama" at the beginning of its name

- You must include the specific attribution notice in a "Notice" text file with any distribution

- Your use must comply with Meta's separate Acceptable Use Policy (referenced at llama.com/llama4/use-policy)

- Limited license to use "Llama" name only for compliance with the branding requirements
[media=twitter]1908602756182745506[/media]

https://pbs.twimg.com/media/Gny4FxMXgAApeXJ.jpg

15/16
@omarsar0
This 2 trillion total parameter model (Behemoth) is a game-changer for Meta.

They had to revamp their underlying RL infrastructure due to the scale.

They're now positioned to unlock insane performance jumps and capabilities for agents and reasoning going forward. Big moves!

16/16
@omarsar0
I expected nothing less. It's great to see Meta become the 4th org to break that 1400 (# 2 overall) on the Arena.

What comes next, as I said above, is nothing to ignore. Open-source AI is going to reach new heights that will break things.

OpenAI understands this well.

bnew · Apr 8, 2025

1/9
@boundlessanurag
Generative AI is making biotech research efficient.

Human creativity, trial-and-error experimentation, and painstaking iteration have powered biological research for ages. Advances in enzyme engineering and drug discovery needed wet lab labor.

2/9
@boundlessanurag
AI is set to fill the void and accelerate biotech research.  Generative AI is disrupting software development and biological sciences by redesigning proteins, manipulating enzymes, and accurately anticipating molecular interactions.

3/9
@boundlessanurag
AI is replacing laborious experiments in biological research with simulations. AI can anticipate and design proteins, enabling advancements in enzyme engineering, antibody treatments, and small-molecule medicines.

4/9
@boundlessanurag
Following DeepMind's AlphaFold-2 and AlphaFold-3, Meta's ESMFold, which predicts protein structures 60x faster than previous approaches, to startups like Latent Labs, which will create AI-based, patient-specific medicinal compounds, these companies  are shaping this transition.

5/9
@boundlessanurag
The Institute for Protein Design created 10,000 enzyme prototypes using RFDiffusion, an AI tool. They used AlphaFold2 and PLACER to choose the top choices. They found enzymes that could repeatedly catalyze reactions, a first for complex proteins, after numerous rounds of testing.

6/9
@boundlessanurag
NVIDIA and Arc Institute's Evo 2, the largest open-source biological AI model, advances biological AI. Evo 2 was trained on 9.3 trillion nucleotides from over 128,000 entire genomes, a breakthrough in generative biology and AI-driven genomics.

7/9
@boundlessanurag
Evo 2 can discover disease-causing mutations, including BRCA1 gene mutations, with over 90% accuracy.

8/9
@boundlessanurag
Could AI capabilities such as quick data processing and the ability to derive hidden insights offer up new paths in longevity research and the discovery of treatments for chronic conditions?

9/9
@boundlessanurag
We feel it is realistic and will come to pass soon.
#artificialintelligence #biotechresearch #enzymestructure #drugdiscovery #Evo2 #generativeAI

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
@Doghablaespanol
꧁𓊈𒆜DeSci𒆜꧂ ꧁𓊈𒆜DesAI𒆜꧂ ꧁𓊈𒆜Market Updates𒆜𓊉꧂

2025-02-25

Biggest AI Model in Biology: Evo 2

Evo 2, co-developed by Arc Institute, @Stanford, and @NVIDIA, is the largest AI model for biology, trained on 128,000 genomes across species.

Capabilities:

Can write whole chromosomes and small genomes from scratch.

Helps analyze non-coding DNA linked to diseases.

Predicts effects of mutations, including those in BRCA1 (linked to breast cancer).

Key Features:

State-of-the-art genome analysis with 9.3 trillion DNA letters.

Processes long-range DNA interactions up to 1 million base pairs apart.

Can assist in CRISPR and gene-editing innovations.

Breakthrough for Genomics & Medicine:

Enhances disease research, personalized medicine, and synthetic biology.

Supports deeper insights into regulatory DNA for biotech advancements.

Source;; Redirect Notice

#AI #Genomics #Biotech #CRISPR #SyntheticBiology #Evo2

⃤ 🅐🅛🅔🅧 #🅓🅔🅟🅘🅝 #🅓🅔🅢🅒🅘 #🅓🅔🅢🅐🅘 #D̳̿͟͞e̳̿͟͞S̳̿͟͞c̳̿͟͞i̳̿͟͞C̳̿͟͞u̳̿͟͞l̳̿͟͞t̳̿͟͞

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@medvolt_ai_

AI-Powered Bioengineering with Evo 2 – A New Era Begins

2/11
@medvolt_ai_

Synthetic biology has long aimed to design biological systems with engineering precision. Yet, nature’s complexity has made this difficult—until now. AI is changing the game.

3/11
@medvolt_ai_

Meet Evo 2, the latest AI model from Arc Institute & NVIDIA. Trained on 128,000 genomes, it can predict mutations, generate DNA sequences, and model long-range genetic interactions.

4/11
@medvolt_ai_

Evo 2’s 40B parameters and extended context window allow it to analyze entire genes, regulatory regions, and chromatin interactions—essential for understanding genome function.

5/11
@medvolt_ai_

Key capabilities of Evo 2:

Predicts pathogenic mutations in seconds

Generates new DNA sequences (small bacterial genomes, yeast chromosomes)

Identifies long-range genome interactions—critical for regulation & disease insights

6/11
@medvolt_ai_

Evo 2 doesn’t just memorize data—it has learned fundamental biological concepts like:

Protein structures

Viral DNA signatures

Gene regulation & chromatin accessibility

7/11
@medvolt_ai_

Why does this matter? AI-driven genome design could revolutionize:

Synthetic biology

Genetic therapies

Biofuels & biomanufacturing

Drug discovery & precision medicine

8/11
@medvolt_ai_

This breakthrough parallels AlphaFold's impact on protein folding—Evo 2 makes bioengineering more predictable, scalable, and accessible than ever before.

9/11
@medvolt_ai_

Evo 2 is open-source and available for researchers via API. Could this be the beginning of AI-driven genetic design? The future of synthetic biology is here.

/search?q=#AI /search?q=#SyntheticBiology /search?q=#Genomics /search?q=#DrugDiscovery /search?q=#Bioengineering

10/11
@medvolt_ai_
At Medvolt, we harness the power of generative AI, alongside other large language models (LLMs) and deep learning technologies, through our innovative platform 𝐌𝐞𝐝𝐆𝐫𝐚𝐩𝐡.

11/11
@medvolt_ai_
𝐅𝐞𝐞𝐥 𝐟𝐫𝐞𝐞 𝐭𝐨 𝐜𝐨𝐧𝐭𝐚𝐜𝐭 𝐮𝐬 𝐢𝐟 𝐲𝐨𝐮 𝐡𝐚𝐯𝐞 𝐚𝐧𝐲 𝐢𝐧𝐪𝐮𝐢𝐫𝐢𝐞𝐬 𝐨𝐫 𝐫𝐞𝐪𝐮𝐢𝐫𝐞 𝐚 𝐝𝐞𝐦𝐨𝐧𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧.

Visit our website: Medvolt | AI Platform for Drug Discovery and Repurposing or reach out to us via email: contact@medvolt.ai

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/21
@arcinstitute
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. Manuscript | Arc Institute

2/21
@arcinstitute
Trained on 9.3 trillion nucleotides from over 128,000 archaeal, prokaryotic, and eukaryotic genomes, Evo 2 brings the power of large language models to biology, enabling new discoveries in bioengineering and medicine. AI can now model and design the genetic code for all domains of life with Evo 2 | Arc Institute

3/21
@arcinstitute
A collaboration between @arcinstitute, @NVIDIAHealth, @Stanford, @UCBerkeley, and @UCSF, Evo 2 is fully open source - including training data, code, and model weights, now on the @nvidia BioNeMo platform. evo2-40b Model by Arc | NVIDIA NIM

4/21
@arcinstitute
To explore the model, try the user-friendly interface Evo Designer to generate DNA by sequence, species, and more: Evo 2: DNA Foundation Model | Arc Institute

5/21
@arcinstitute
Arc Institute also worked with AI research lab @GoodfireAI to develop a mechanistic interpretability visualizer that uncovers the key biological features and patterns the model learns to recognize in genomic sequences: Evo 2: DNA Foundation Model | Arc Institute

6/21
@WilliamLamkin
Awesome research! Congratulations to all the collaborators

7/21
@vedangvatsa
Hidden Gems in Evo 2's paper

[Quoted tweet]

Hidden Gems in Evo 2’s Paper

A groundbreaking biological foundation model trained on 9.3 trillion DNA base pairs.
[media=twitter]1892300017005650411[/media]

8/21
@Molecule_dao
This is massive

9/21
@shae_mcl
Enormous fan of this work - I’m curious how Evo 2 performs on some of the standard genome understanding evaluations like GUE? It would be great to compare all the existing sota models on a common set of downstream tasks

10/21
@is_OwenLewis
Massive news! Biology just took a giant leap forward.

11/21
@BondeKirk
This seems like it lowers the bar for creating novel pathogens. I'd be grateful to understand the reasoning behind the decision to open-source this.

Since ARC is filled with smart and thoughtful people, surely the dual-use nature of this AI model crossed your mind?

12/21
@AllThingsApx
A fun gold nugget from the ablation experiments in the appendix:

Focusing pretraining on functional regions (rather than whole genomes) improved predictive performance, especially for noncoding variants!

Cool stuff.

13/21
@oboelabs
it's wild how much genetic code and computer code have in common! ai is now being used to debug both

14/21
@bobbysworld69
Will it cure cancer

15/21
@Hyperstackcloud
Incredible work team!

16/21
@TheOriginalNid
Weird, my name isn’t on there.

17/21
@Math_MntnrHZ
Evo 2’s precision in genetic code design could accelerate biotech innovations, redefining genome engineering. A pivotal moment for life sciences!

18/21
@smdxit
So many tools this day, so little time for experimenting

19/21
@Digz_0
YUGE

20/21
@karmicoder
I always wanted to understand life and this is the perfect thing to get deeper into it.

21/21
@Moozziii

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 8, 2025

1/25
@BrianRoemmele
This AI can write genomes from scratch.

The Arc Institute with NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

This AI doesn’t just analyze genomes. it creates them.

Link: Manuscript | Arc Institute

2/25
@MarceauFetiveau
The AI revolution in medicine is happening right now.

Diseases we once thought impossible to beat? Their days are numbered.

Get ready to live longer, healthier, and witness the future of medicine unfold.

3/25
@Birdmeister17
I had Grok analyze and summarize this 65 page document so normal people can understand it. We are living in the future. https://grok.com/share/bGVnYWN5_76007630-a88b-4fb8-a71e-12dd4faafc61

4/25
@snarkyslang
This may be a dumb question, but creating genomes from nothing is essentially designing life that doesn't presently exist, is that right?

5/25
@EagleTurd
That's scary actually.

6/25
@NaturallyDragon
Bring back Dragons!

https://video.twimg.com/ext_tw_video/1892322995575934981/pu/vid/avc1/1920x1080/GJ2tuPGw6OXmpNhy.mp4

7/25
@mark74181442
Soon, we will be able to 3-d print it too.

8/25
@Ronan_A1
This is so wild…. The exponential change is mind boggling

[Quoted tweet]
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/
[media=twitter]1892251343881937090[/media]

9/25
@TheLastDon222
And this is where things get scary. I think we may be entering into a tower of Babel situation.

10/25
@atlasatoshi

11/25
@smbodie3
Scary.

12/25
@neilarmstrongx1
this week is wild, we are all of a sudden in my childhood science fiction distant future. I wouldn't be surprised if Buck Rogers showed up

13/25
@pilot_winds
@robertsepehr you might find this useful.

14/25
@ChrisandOla
Is this a good thing?

15/25
@waldematrix
Great

now they are really going to make cat girls and Werewolves.

16/25
@calledtocnstrct
Now if a cell simulator can be built to model what happens when the

is controlling things... Would be interesting to see what creature develops.

17/25
@CPtte
Wow!

18/25
@CPtte
Just the models of them or are we talking create them in physical form? As an actual strand of DNA?

19/25
@Dusty45Cal
I might read this to my kids tonight.

20/25
@fwchiro
When do I get an uplifted dog that can help with the chores?

21/25
@AltVRat
Plus crisper and

Singularity Escape Velocity

22/25
@Mike___Kilo
You sure that’s a good idea? Just because we can, doesn’t mean we should.

23/25
@D0cCoV
Let it analyze saescov2 genome!

24/25
@MonaLiesbeth
I really don’t think this is a good idea.

25/25
@DaneFreshie
The greater familiarity with biology one has, the more concerning this is.

cks-for-posting-the-coli-megathread.984734/post-52211196[/URL][/SIZE][/B][/COLOR]

1/34
@IterIntellectus
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/

2/34
@IterIntellectus
Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome-scale

Evo 2 is FULLY OPEN, including model parameters, training code, inference code, and
the OpenGenome2 dataset LMFAO

2/

3/34
@IterIntellectus
think of it as a DNA-focused LLM. instead of text, it generates genomic sequences. it reads and interprets complex DNA, including noncoding regions usually considered jink, generates entire chromosomes and new genomes, and predicts disease-causing mutations, even those not understood

3/

4/34
@IterIntellectus
this is biology hacking
AI is moving beyond describing biology to designing it. this allows for synthetic life engineered from scratch, programmable genomes optimized by AI, potential new gene therapies, and lays the groundwork for whole-cell simulation.
biology is becoming a computational discipline

4/

5/34
@IterIntellectus
it was trained on a dataset of 9.3 trillion base pairs from bacteria, archaea, eukaryotes, and bacteriophages
it processes up to 1 million base pairs in a single context window, covering entire chromosomes. it identifies evolutionary patterns previously unseen by humans

5/

6/34
@IterIntellectus
Evo-2 has demonstrated practical generation abilities, creating synthetic yeast chromosomes, mitochondrial genomes, and minimal bacterial genomes.
this is computational design in action.

6/

7/34
@IterIntellectus
Evo-2 understands noncoding DNA, which regulates gene expression and is involved in many genetic diseases.
it predicts the functional impact of mutations in these regions, achieving state-of-the-art performance on noncoding variant pathogenicity and BRCA1 variant classification.
this could lead to advances in precision medicine

7/

8/34
@IterIntellectus
Evo-2 uses stripedhyena 2, combining convolution and attention mechanisms, not transformers.
it models DNA at multiple scales, capturing long-range interactions, and autonomously learns features like exon-intron boundaries and transcription factor binding sites without human guidance.
it’s not just memorizing
it’s understanding biology.

8/

9/34
@IterIntellectus
Evo-2 predicts whether mutations are harmful or benign without specific training on human disease data.

WHAT

it outperforms specialized models on BRCA1 variants and handles noncoding mutations effectively, suggesting it has learned DNA’s fundamental principles

9/

10/34
@IterIntellectus
Evo-2 generates DNA sequences that influence chromatin accessibility, controlling gene expression.

it has embedded simple Morse code into epigenomic designs as a proof of concept, not a practical application. this shows potential for designing programmable gene circuits.

make me blonde. thank you

10/

11/34
@IterIntellectus
Evo-2 is FULLY OPEN SOURCE, including model parameters, training data, and code.
this will lead to massive widespread innovation in bioengineering, lowering barriers to genome design.
it’s a revolution moment for the field.

the era of biotech is here

11/

12/34
@IterIntellectus
The Arc Institute aims to model entire cells, moving beyond DNA to whole organisms.

this could lead to AI creating new life forms and synthetic biology becoming AI-driven.

the future involves programming life at increasing scales

12/

13/34
@IterIntellectus
three years ago, AI focused on chatbots.
now it generates genomes. soon, it will design complex biological systems. this is a new phase
humans are no longer just studying biology but rewriting its code.

biology’s future is computational. are you prepared?

13/

14/34
@IterIntellectus
Manuscript | Arc Institute

15/34
@boneGPT
if it's possible to use AI to create every possible genome in the latent space of genes, there is a case that humans could have been discovered galaxies away.

Any sufficiently advanced species would know about man long before ever making contact with earth.

16/34
@IterIntellectus
soon

17/34
@kaiotei_
This is insane and a true breakthrough sheesh. just last week i finally had the mind to feed my genome into chatgpt and have it run dozens of analyzations on my SNPs, genomics and AI is big. This is kind of terrifying to me though because I just imagine it creating super extra covid

18/34
@IterIntellectus
or super soldiers lesssgooooo

19/34
@NickADobos
How soon can I can I CRISPR myself with this?

20/34
@IterIntellectus
realistically speaking, safely, 10-15 years
discard safety, 5

21/34
@parakeetnebula
WE LOVE TO SEE IT

22/34
@IterIntellectus
HELL YEAH

23/34
@thatsallfrens
I WANT A GNOME FROM SCRAC

24/34
@IterIntellectus
I WANT BIG PP

25/34
@AISafetyMemes
*taps sign*

[Quoted tweet]
MIT professor and CRISPr gene drive inventor is sounding the alarm:

A 90% lethal virus that infects 50% of humans in 100 days is now possible.

Vaccines will be far too slow.

Extinction cult Aum Shinrikyo went to Africa to produce purified ebola, but, lucky for us, they failed:

Kevin Esvelt: “They bought a uranium mine, they started developing chemical weapons, they started looking for biological weapons. And while there weren’t very many that they had access to at the time, they were able to produce botulinum toxin and they tried to make enough anthrax.

The leader of their bioweapons programme, when they went to Africa, he was hoping that they would find someone who was infected with Ebola so that he could purify the virus and spread it around, so that it would hopefully transmit and kill as many people as possible.”

Omicron infected 50% of Europe in 100 days - vaccines will be much too slow: “Now, imagine something that was released across multiple airports to start with, and you can see how the moonshot vaccine initiatives that hope to get a new vaccine working and approved in 100 days are still going to be much too slow."

90% lethality is possible: “Rabbit calicivirus — it’s more than 90% lethal in adult rabbits. If nature can do that in an animal, that means it’s possible.”

It could spread far faster than natural pandemics because we have air travel.

A new RAND report said: “Previous attempts to weaponise biological agents, such as an attempt by the Japanese Aum Shinrikyo cult to use botulinum toxin in the 1990s, had failed because of a lack of understanding of the bacterium. AI could “swiftly bridge such knowledge gaps”

Bioweapons experts think we might be 1-3 years away from AI-assisted large-scale biological attacks that bring society to its knees.

Open source AI means irreversible proliferation to the Aum Shinrikyos of the world. We’re giving them weapons we can never take back.
[media=twitter]1714384953696211345[/media]

26/34
@AsycLoL
Wtf

27/34
@zeee_media
@threadreaderapp unroll

28/34
@vvdecay
@SorosBruv

29/34
@NI_Intern
Does this mean we're getting Jurassic Park

30/34
@theshadow27
I’m not normally a doomer but this one gives me pause.

An open source version of this means it’s possible to tell it to mix anthrax with Covid. Or worse. CRiSPeR can print any sequence.

What have we done?

31/34
@biotech_bro
Genuinely not overstating how important and transformative it is. You can synthesize organisms that breakdown plastics, and bio-synthetically generate key products and reagents (e.g., Ginkgo on steroids) and maybe even can help terraform planets!!!

32/34
@_anoneng
YAY! Now we can have "life forms" whose genetics are the equivalent of chatgpt slop that barely qualifies as english.

[Quoted tweet]
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/
[media=twitter]1892251343881937090[/media]

33/34
@craigh64
Now we just need it to design some sort of "perfect organism" that we can use as a weapon!

34/34
@8_O_B

bnew · Apr 8, 2025

1/16
@buildthatidea
This was one of those weeks where decades happened

- Grok 3 goes public
- Perplexity R1-1776
- OpenAI SWE-Lancer
- Google AI co-scientist
- Nvidia and Arc's Evo 2
- Microsoft’s quantum breakthrough

Here's what you need to know

2/16
@buildthatidea
1/ We shipped Build That Idea

It lets anyone launch AI agents in 60 seconds and monetize them

so easy your mum can do it

Sign up here: BuildThatIdea - Launch GPT Wrappers in 60 Seconds!

[Quoted tweet]
Introducing Build That Idea

A platform that lets anyone launch their own AI agent in 60 seconds

- Define your Agent
- Choose a base LLM (OpenAI, Claude, DeepSeek, etc.)
- Upload knowledge base
- Set pricing and start making money

Join the waitlist below

https://video.twimg.com/ext_tw_video/1892956745795776512/pu/vid/avc1/720x728/Peu6tj9FT0ofOguD.mp4

3/16
@buildthatidea
2/ xAI team dropped the world’s smartest AI on earth

- Trained on 10x more compute than Grok-2
- Ranks #1 on Chatbot Arena.
- Outperforms the top reasoning models from Google and OpenAI

and they built it in just 19 months. Insane 🫡

[Quoted tweet]
This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt).

Try Grok 3 now: nitter.poast.org/i/grok

X Premium+ and SuperGrok users will have increased access to Grok 3, in addition to early access to advanced features like Voice Mode

https://video.twimg.com/ext_tw_video/1892399262706913282/pu/vid/avc1/1156x720/_3eBbHGohdrajxiX.mp4

4/16
@buildthatidea
3/ OpenAI introduced SWE-Lancer, a benchmark that tests AI models performance on freelance jobs from Upwork.

It includes 1,400 tasks worth over $1m in economic value

They found that Claude 3.5 is better at coding than OpenAI's own GPT-4o and o1

[Quoted tweet]
Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. openai.com/index/swe-lancer/

5/16
@buildthatidea
4/ Google released AI Co-Scientist, a multi-agent AI system built with Gemini 2.0

It can generate hypotheses, validate research, and accelerate scientific discoveries in medicine, chemistry, and genetics.

It has already discovered a new drug for blood cancer.

https://nitter.poast.org/GoogleAI/status/1892214154372518031

6/16
@buildthatidea
5/ Microsoft made a big breakthrough in quantum computing with its new Majorana 1 chip.

This new chip uses a topological qubit design, bringing us closer to scalable quantum computing with millions of qubits on a single chip.

https://video.twimg.com/amplify_video/1892245060369960960/vid/avc1/1920x1080/hITyM4HiQg8hooNb.mp4

7/16
@buildthatidea
6/ Microsoft Research released BioEmu-1, a deep learning model for predicting protein folding and movement at scale.

It can generate protein structures 100,000x faster than traditional simulations

The best part? It's free for researchers worldwide via Azure AI Foundry Labs.

https://nitter.poast.org/MSFTResearch/status/1892597609769918637

8/16
@buildthatidea
7/ Arc Institute and NVIDIA dropped the world's largest AI model for biology

It can predict harmful mutations, design synthetic genomes, and understand the fundamental code of life

and it’s fully open-source

[Quoted tweet]
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. arcinstitute.org/manuscripts…

9/16
@buildthatidea
8/ Perplexity released R1 1776 to remove Chinese Communist Party censorship.

It's a version of DeepSeek R1 that provides uncensored and unbiased answers.

Available via sonar api

[Quoted tweet]
Today we're open-sourcing R1 1776—a version of the DeepSeek R1 model that has been post-trained to provide uncensored, unbiased, and factual information.

https://video.twimg.com/amplify_video/1891916471498133504/vid/avc1/1280x720/07DHKn6Y6-SNDcjv.mp4

10/16
@buildthatidea
9/ Microsoft introduced Muse AI, a new tool that helps game developers create new gameplay ideas and bring old games back to life

It was trained on over seven years of continuous gameplay data and is now open-sourced for developers.

[Quoted tweet]
If you thought AI-generated text, images, and video were cool, just imagine entire interactive environments like games!

https://video.twimg.com/ext_tw_video/1892243717056315392/pu/vid/avc1/1280x720/W8hp4ZUIc-k8RBJL.mp4

11/16
@buildthatidea
10/ Mistral released Saba, a language model designed for Middle Eastern and South Asian regions.

It supports Arabic, Tamil, and Malayalam and is optimized for conversational AI and culturally relevant content.

Available via API

[Quoted tweet]

Announcing @MistralAI Saba, our first regional language model.
- Mistral Saba is a 24B parameter model trained on meticulously curated datasets from across the Middle East and South Asia.
- Mistral Saba supports Arabic and many Indian-origin languages, and is particularly strong in South Indian-origin languages such as Tamil and Malayalam.

12/16
@buildthatidea
11/ Convergence introduced the world's most capable web-browsing agent

It can:
- Type, scroll, and click on websites autonomously
- Manage software via web-based interfaces
- Automate workflows with scheduled tasks

[Quoted tweet]
Introducing Proxy 1.0 - the world's most capable web-browsing agent.

https://video.twimg.com/ext_tw_video/1892115303347138560/pu/vid/avc1/1280x720/zhMbFGWh2iq6_M7t.mp4

13/16
@buildthatidea
12/ Pika launched PikaSwaps, an AI-powered video editing tool

It lets you swap people and objects in any video using AI

The results are so realistic that you can’t even tell they were edited

https://video.twimg.com/ext_tw_video/1892999984649265152/pu/vid/avc1/1280x720/4DgfyiZ0hIiVKQZZ.mp4

14/16
@buildthatidea
That's a wrap! Hope you enjoyed it

If you found this thread valuable:

1. Follow @0xmetaschool for more
2. Drop a like or retweet on the first tweet of this thread

[Quoted tweet]
This was one of those weeks where decades happened

- Grok 3 goes public
- Perplexity R1-1776
- OpenAI SWE-Lancer
- Google AI co-scientist
- Nvidia and Arc's Evo 2
- Microsoft’s quantum breakthrough

Here's what you need to know

15/16
@CharlesHL
ww @readwise save thread

16/16
@almemater
Since this is happening,
would love to see a ‘Debug Web3 marketing’ thread for all the web3 newbies.

@fatimarizwan how about a collaboration for clear skin?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 8, 2025

https://buildthatidea.com/blog/how-are-ai-image-generation-models-built

How are AI Image Generation Models Built?

Learn about AI Image Generation Models, how they work, and how are they built from scratch.

Rajat DangiMarch 28, 2025 · 9 min read

Turn your images into Studio Ghibli style anime art using Animify Image Generator.

Key Takeaways

AI image generation models, like those behind ChatGPT 4o and DALL-E, Google Gemini, Grok, and Midjourney, are built using advanced machine learning techniques, primarily diffusion models, with Grok using a unique autoregressive approach.
These models require vast datasets of images and text, powerful computing resources like GPUs, and expertise in machine learning and computer vision.
Building one from scratch involves collecting data, designing model architectures, and training them, which is resource-intensive and complex.

A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel. - ChatGPT 4o

Understanding AI Image Generation

AI image generation has transformed how we create visual content, enabling tools like ChatGPT 4o, OpenAI DALL-E, Imagen by Google, Aurora by xAI, and Midjourney to produce photorealistic or artistic images from text descriptions. These models are at the heart of popular platforms, making it essential to understand their construction for both technical enthusiasts and out of curiousity.

Technologies Behind Popular Tools

DALL-E (OpenAI): Powers ChatGPT's image generation, using diffusion models that transform noise into images based on text, known for high realism (Hierarchical Text-Conditional Image Generation with CLIP Latents).
Google Gemini (Imagen): Utilizes Imagen 3, a diffusion model that excels in photorealistic outputs, leveraging large language models for text encoding (Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding).
Grok (Aurora by xAI): Employs an autoregressive model, predicting image tokens sequentially, differing from the diffusion approach used by others (Grok Image Generation Release).
Midjourney: Relies on diffusion models, known for artistic and detailed images, though specifics are proprietary (Midjourney - Wikipedia).

What It Takes To Build Image Generation Models from Scratch

Creating an AI image generator involves:

Data Needs: Millions of image-text pairs, like those used for DALL-E, ensuring diversity for broad concept coverage.
Compute Power: Requires GPUs or TPUs for training, with costs in thousands of GPU hours.
Expertise: Knowledge in machine learning, computer vision, and natural language processing is crucial, alongside stable training techniques.
Challenges: Includes ethical concerns like bias prevention and high computational costs, with diffusion models offering stability over older GANs.

This process is complex, but understanding it highlights the innovation behind these tools, opening doors for future advancements.

Exploring Different AI Image Generation Models

AI image generation has revolutionized creative industries, enabling the production of photorealistic and artistic images from textual prompts. Tools like DALL-E, Imagen, Aurora, and Midjourney have become household names, integrated into platforms like ChatGPT, Google Gemini, Grok, and Midjourney. This section delves into the technologies behind these models and the intricate process of building them from scratch, catering to both technical and non-technical audiences.

Popular AI Image Generators

Several prominent AI image generators have emerged, each with distinct technological underpinnings:

DALL-E (OpenAI): Likely the backbone of ChatGPT's image generation, especially versions like ChatGPT 4o, DALL-E uses diffusion models. The research paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" (Hierarchical Text-Conditional Image Generation with CLIP Latents) details DALL-E 2's architecture, which involves a prior generating CLIP image embeddings from text and a decoder using diffusion to create images. This model, with 3.5 billion parameters, enhances realism and resolution, integrated into ChatGPT for seamless user interaction.
Google Gemini (Imagen): Google Gemini leverages Imagen 3 for image generation, as noted in recent updates (Google Gemini updates: Custom Gems and improved image generation with Imagen 3). Imagen uses diffusion models, with the research paper "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" (Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding) describing its architecture. It employs a large frozen T5-XXL encoder for text and conditional diffusion models for image generation, achieving a COCO FID of 7.27, indicating high image-text alignment.
Grok (Aurora by xAI): Grok, developed by xAI, uses Aurora for image generation, as announced in the blog post "Grok Image Generation Release" (Grok Image Generation Release). Unlike others, Aurora is an autoregressive mixture-of-experts network, trained on interleaved text and image data to predict the next token, offering photorealistic rendering and multimodal input support. This approach, detailed in the post, contrasts with diffusion models, focusing on sequential prediction.
Midjourney: Midjourney, a generative AI program, uses diffusion models, as inferred from comparisons with Stable Diffusion and DALL-E (Midjourney - Wikipedia). While proprietary, industry analyses suggest it leverages diffusion for real-time image generation, known for artistic outputs and accessed via Discord or its website, entering open beta in July 2022.

These tools illustrate the diversity in approaches, with diffusion models dominating due to their quality, except for Grok's unique autoregressive method.

Breakdown of Technologies Behind AI Image Generation Models

The core technologies driving these models include diffusion models, autoregressive models, and historical approaches like GANs and VAEs. Here's a deeper dive:

Diffusion Models: The State-of-the-Art.

Diffusion models, as used in DALL-E, Imagen, and Midjourney, operate through a two-stage process:

Forward Process: Gradually adds noise to an image over many steps, creating a sequence from a clear image to pure noise. This is akin to sculpting, where noise is like chiseling away marble to reveal the form.
Reverse Process: Trains a neural network, often a U-Net, to predict and remove noise at each step, starting from noise to generate a coherent image. For text-to-image, text embeddings guide this process, ensuring the image aligns with the prompt.

bnew · Apr 8, 2025

The architecture, as seen in Imagen, involves a text encoder (e.g., T5-XXL) and conditional diffusion models, with upsampling stages (64×64 to 1024×1024) using super-resolution diffusion models. DALL-E 2's decoder modifies Nichol et al.'s (2021) diffusion model, adding CLIP embeddings for guidance, with training details in Table 3 from the paper:

Model	Diffusion Steps	Noise Schedule	Sampling Steps	Sampling Variance Method	Model Size	Channels	Depth	Channels Multiple	Heads Channels	Attention Resolution	Text Encoder Context	Text Encoder Width	Text Encoder Depth	Text Encoder Heads	Latent Decoder Context	Latent Decoder Width	Latent Decoder Depth	Latent Decoder Heads	Dropout	Weight Decay	Batch Size	Iterations	Learning Rate	Adam β2\beta_2β2	Adam ϵ\epsilonϵ	EMA Decay
AR prior	-	-	-	-	1B	-	-	-	-	-	256	2048	24	32	384	1664	24	26	-	4.0e-2	4096	1M	1.6e-4	0.91	1.0e-10	0.999
Diffusion prior	1000	cosine	64	analytic [2]	1B	-	-	-	-	-	256	2048	24	32	-	-	-	-	-	6.0e-2	4096	600K	1.1e-4	0.96	1.0e-6	0.9999
64→256 Upsampler	1000	cosine	27	DDIM [47]	700M	320	3	1,2,3,4	-	-	-	-	-	-	-	-	-	-	0.1	-	1024	1M	1.2e-4	0.999	1.0e-8	0.9999
256→1024 Upsampler	1000	linear	15	DDIM [47]	300M	192	2	1,1,2,2,4,4	-	-	-	-	-	-	-	-	-	-	-	-	512	1M	1.0e-4	0.999	1.0e-8	0.9999

This table highlights hyperparameters, showing the computational intensity, with batch sizes up to 4096 and iterations in the millions.

Autoregressive Models: Sequential Prediction

Grok's Aurora uses an autoregressive approach, predicting image tokens sequentially, akin to writing a story word by word. The xAI blog post describes it as a mixture-of-experts network, trained on billions of internet examples, excelling in photorealistic rendering. This method, detailed in the release, contrasts with diffusion by generating images part by part, potentially slower but offering unique capabilities like editing user-provided images.

Historical Approaches: GANs and VAEs

GANs, with a generator and discriminator competing, and VAEs, encoding images into latent spaces for decoding, were early methods. However, diffusion models, as noted in Imagen's research, outperform them in fidelity and diversity, making them less common in current state-of-the-art systems.

How to Build an AI Image Generator from Scratch?

Constructing an AI image generator from scratch is a monumental task, requiring:

Data Requirements:
- Vast datasets are essential, with DALL-E trained on approximately 650 million image-text pairs, as per IEEE Spectrum (DALL-E 2’s Failures Are the Most Interesting Thing About It). These must be diverse, covering various styles and concepts, with quality ensuring robust learning.
Computational Resources:
- Training demands powerful GPUs or TPUs, with costs in thousands of GPU hours, reflecting the scale seen in DALL-E and Imagen. Infrastructure for distributed training, as implied in the papers, is crucial for handling large-scale data.
Model Architecture:
- For diffusion models, implement U-Net architectures, as in Imagen, with text conditioning via large language models. For autoregressive, use transformers, as in Aurora, handling sequential token prediction. The choice depends on desired output quality and speed.
Training Process:
- Data Preprocessing: Clean datasets, tokenize text, and resize images for uniformity, ensuring compatibility with model inputs.
- Model Initialization: Leverage pre-trained models, like T5 for text encoding, to reduce training time, as seen in Imagen.
- Optimization: Use advanced techniques, with learning rates and batch sizes from Table 3, ensuring stable convergence, especially for diffusion models.
Challenges and Considerations:
- Training Stability: Diffusion models, while stable, require careful tuning, unlike GANs prone to mode collapse. Ethical concerns, as noted in DALL-E's safety mitigations (DALL·E 2), include filtering harmful content and monitoring bias.
- Compute Costs: High energy and hardware costs, with environmental impacts, are significant, necessitating efficient architectures like Imagen's Efficient U-Net.
- Expertise Needed: Requires deep knowledge in machine learning, computer vision, and natural language processing, with skills in handling large-scale training pipelines.

This process, while feasible with resources, underscores the complexity, with open-source alternatives like Stable Diffusion offering starting points for enthusiasts.

Conclusion

AI image generation, dominated by diffusion models, with Grok's autoregressive approach adding diversity, showcases technological innovation. Building from scratch demands significant data, compute, and expertise, highlighting the barriers to entry. As research progresses, expect advancements in efficiency, ethics, and multimodal capabilities, further blurring human-machine creative boundaries.

bnew · Apr 8, 2025

1/13
@buildthatidea
Anthropic recently dropped fascinating research on how models like Claude actually think and work.

It's one of the most important research papers of 2025

Here are my 7 favorite insights

2/13
@buildthatidea
1/ Claude plans ahead when writing poetry!

It identifies potential rhyming words before writing a line, then constructs sentences to reach those words.

It's not just predicting one word at a time.

3/13
@buildthatidea
2/ Claude has a universal language of thought

When processing questions in English, French, or Chinese, it activates the same internal features for concepts

It then translates these concepts into the specific language requested.

4/13
@buildthatidea
3/ Claude does mental math using parallel computational paths

One path is for approximation, another for precise digit calculation.

But when asked how it solved the problem, it describes using standard algorithms humans use

5/13
@buildthatidea
4/ Claude can “fake” its reasoning

When solving a math problem, it might give a full chain of thought that sounds correct

But inside, it never actually did the math

It just made up the steps to sound helpful

6/13
@buildthatidea
5/ It performs multi-step real reasoning

When solving complex questions like "What's the capital where Dallas is located?", Claude actually follows distinct reasoning steps: first activating "Dallas is in Texas" and then "capital of Texas is Austin."

It's not just memorization

7/13
@buildthatidea
6/ Claude defaults to not answering when unsure

Claude’s instinct is to say “I don’t know.”

For known entities (like Michael Jordan), a "known entity" feature inhibits this refusal. But sometimes this feature misfires, causing hallucinations.

8/13
@buildthatidea
7/ Jailbreaks work by exploiting conflicts inside Claude

In one example Claude was tricked into writing something dangerous like BOMB

It continued only because it wanted to finish a grammatically correct sentence

Once that was done it reverted to safety and refused to continue

9/13
@buildthatidea
8/ Why this matters

- Better understanding of AI leads to better safety
- We can catch fake logic
- Prevent hallucinations
- Understand when and how reasoning happens
- And make more trustworthy systems

10/13
@buildthatidea
If you’re building, researching, or just curious about how language models work, this is a must-read.

Read it here: https://www.anthropic.com/research/tracing-thoughts-language-model

11/13
@buildthatidea
Want to build AI apps?

With BuildThatIdea - Launch GPT Wrappers in 60 Seconds!, you can build and monetize AI apps in 60 seconds

Sign up here:

12/13
@buildthatidea
That's a wrap

Hope you enjoyed it

If you found this thread valuable:

1. Follow @0xmetaschool for more
2. Retweet the first tweet so more people can see it

[Quoted tweet]
Anthropic recently dropped fascinating research on how models like Claude actually think and work.

It's one of the most important research papers of 2025

Here are my 7 favorite insights

13/13
@KatsDojo
Gonna deff be sharing this with the team ty always!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 10, 2025

[MIT] Self-Steering Language Models. "When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1"

Posted on Thu Apr 10 11:43:04 2025 UTC

Self-Steering Language Models

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract...

arxiv.org

Computer Science > Computation and Language

[Submitted on 9 Apr 2025]

Self-Steering Language Models

Gabriel Grand, Joshua B. Tenenbaum, Vikash K. Mansinghka, Alexander K. Lew, Jacob Andreas

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.07081 [cs.CL]
	(or arXiv:2504.07081v1 [cs.CL] for this version)
	Self-Steering Language Models

Submission history

From: Gabriel Grand [view email]

[v1] Wed, 9 Apr 2025 17:54:22 UTC (668 KB)

https://arxiv.org/pdf/2504.07081

bnew · Apr 10, 2025

techxplore.com

Text2Robot platform leverages generative AI to design and deliver functional robots with just a few spoken words

Duke University

8–10 minutes

Overview of the four steps in Text2Robot framework. Credit: arXiv (2024). DOI: 10.48550/arxiv.2406.19963

When personal computers were first invented, only a small group of people who understood programming languages could use them. Today, anyone can look up the local weather, play their favorite song or even generate code with just a few keystrokes.

This shift has fundamentally changed how humans interact with technology, making powerful computational tools accessible to everyone. Now, advancements in artificial intelligence (AI) are extending this ease of interaction to the world of robotics through a platform called Text2Robot.

Developed by engineers at Duke University, Text2Robot is a novel computational robot design framework that allows anyone to design and build a robot simply by typing a few words describing what it should look like and how it should function. Its novel abilities will be showcased at the upcoming IEEE International Conference on Robotics and Automation (ICRA 2025) taking place May 19–23, in Atlanta, Georgia.

Last year, the project won first place in the innovation category at the Virtual Creatures Competition that has been held for 10 years at the Artificial Life conference in Copenhagen, Denmark. The team's paper is available on the arXiv preprint server.

"Building a functional robot has traditionally been a slow and expensive process requiring deep expertise in engineering, AI and manufacturing," said Boyuan Chen, the dikkinson Faculty Assistant Professor of Mechanical Engineering and Materials Science, Electrical and Computer Engineering, and Computer Science at Duke University. "Text2Robot is taking the initial steps toward drastically improving this process by allowing users to create functional robots using nothing but natural language."

Credit: Duke University

Text2Robot leverages emerging AI technologies to convert user text descriptions into physical robots. The process begins with a text-to-3D generative model, which creates a 3D physical design of the robot's body based on the user's description.

This basic body design is then converted into a moving robot model capable of carrying out tasks by incorporating real-world manufacturing constraints, such as the placement of electronic components and the functionality and placement of joints.

The system uses evolutionary algorithms and reinforcement learning to co-optimize the robot's shape, movement abilities and control software, ensuring it can perform tasks efficiently and effectively.

"This isn't just about generating cool-looking robots," said Ryan Ringel, co-first author of the paper and an undergraduate student in Chen's laboratory. "The AI understands physics and biomechanics, producing designs that are actually functional and efficient."

For example, if a user simply types a short description such as "a frog robot that tracks my speed on command" or "an energy-efficient walking robot that looks like a dog," Text2Robot generates a manufacturable robot design that resembles the specific request within minutes and has it walking in a simulation within an hour. In less than a day, a user can 3D-print, assemble and watch their robot come to life.

https://scx2.b-cdn.net/gfx/video/2025/text2robot-platform-le.mp4

The new Text2Robot platform can design and 3D print a wide range of animal-inspired mobile robots based solely on a user’s request. Credit: Duke University

"This rapid prototyping capability opens up new possibilities for robot design and manufacturing, making it accessible to anyone with a computer, a 3D printer and an idea," said Zachary Charlick, co-first author of the paper and an undergraduate student in the Chen lab. "The magic of Text2Robot lies in its ability to bridge the gap between imagination and reality."

Text2Robot has the potential to revolutionize various aspects of our lives. Imagine children designing their own robot pets or artists creating interactive sculptures that can move and respond. At home, robots could be custom-designed to assist with chores, such as a trash can that navigates a home's specific layout and obstacles to empty itself on command. In outdoor environments, such as a disaster response scenario, responders may desire different types of robots that can complete various tasks under unexpected environmental conditions.

The framework currently focuses on quadrupedal robots, but future research will expand its capabilities to a broader range of robotic forms and integrate automated assembly processes to further streamline the design-to-reality pipeline.

https://scx2.b-cdn.net/gfx/video/2025/text2robot-platform-le-1.mp4

Pieces of an AI-designed and 3D-printed mobile robot are assembled. Credit: Duke University

"This is just the beginning," said Jiaxun Liu, co-first author of the paper and a second-year Ph.D. student in Chen's laboratory. "Our goal is to empower robots to not only understand and respond to human needs through their intelligent 'brain,' but also adapt their physical form and functionality to best meet those needs, offering a seamless integration of intelligence and physical capability."

At the moment, the robots are limited to basic tasks like walking by tracking speed commands or walking on rough terrains. But the group is looking into adding sensors and other gadgets into the platform's abilities, which would open the door to climbing stairs and avoiding dynamic obstacles.

"The future of robotics is not just about machines; it's about how humans and machines collaborate to shape our world," added Chen. "By harnessing the power of generative AI, this work brings us closer to a future where robots are not just tools but partners in creativity and innovation."

More information: Ryan P. Ringel et al, Text2Robot: Evolutionary Robot Design from Text Descriptions, arXiv (2024). DOI: 10.48550/arxiv.2406.19963

Journal information: arXiv

Citation: Text2Robot platform leverages generative AI to design and deliver functional robots with just a few spoken words (2025, April 10) retrieved 10 April 2025 from Text2Robot platform leverages generative AI to design and deliver functional robots with just a few spoken words

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

bnew · Sunday at 2:39 PM

OpenAI CFO: updated o3-mini is now the best competitive programmer in the world

Posted on Sat Apr 12 15:00:10 2025 UTC

https://v.redd.it/wjamknhs4fue1

Commented on Sat Apr 12 16:53:15 2025 UTC

now its just a question of when they will make in AI that can do the work of the AI engineer.

│
│

│ Commented on Sat Apr 12 17:50:34 2025 UTC
│
│ I think that’s the goal, to close the loop where the AI can start self improving by doing its own research and software improvements
│

1/11
@slow_developer
openAI CFO claimed that:

"updated o3-mini" is now the best competitive programmer in the world.

STRANGE.... could she have misspoken and meant the full o3 model instead?

in feb, o3 was at the 50th percentile, but now o3-mini is claimed to be number one

such a rapid leap seems unlikely, as it would require major progress in both o3 and o3-mini

2/11
@slow_developer
around 12:48 minutes

3/11
@estebs
How does it compare to Gemini 2.5 ?

4/11
@slow_developer
that's where the confusion is, i didnt notice the updated o3-mini, and gemini 2.5 pro are better than this

5/11
@robertkainz04
O4 should definitely be the best but o3-mini not

6/11
@slow_developer
def, but she confused me there

7/11
@ai_robots_goats
CFO not CTO

8/11
@slow_developer
what did i write?

9/11
@hive_echo
Sam Altman did say the to be released full o3 is now more capable. So it could be the full o3 but I still would be surprised it got there so quickly.

10/11
@figuregpt
o3-mini on top, full o3 got sniped

11/11
@austinoma
maybe meant o4-mini

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/15
@btibor91
OpenAI CFO Sarah Friar on the race to build artificial general intelligence (Goldman Sachs’ Disruptive Tech Summit in London on March 5, 2025)

"And then the third that is coming is what we call A-SWE. We're not the best marketers, by the way, you might have noticed. But Agentic Software Engineer.

And this is not just augmenting the current software engineers in your workforce, which is kind of what we can do today through Copilot. But instead, it's literally an agentic software engineer that can build an app for you.

It can take a PR that you would give to any other engineer and go build it. But not only does it build it, it does all the things that software engineers hate to do.

It does its own QA, its own quality assurance, its own bug testing and bug bashing, and it does documentation - things you can never get software engineers to do.

So suddenly you can force-multiply your software engineering workforce."

---

"I decide not to roll out models because I don't have enough compute.
Sora, our video gen model, was ready to go in probably February, March of last year. We didn't roll it out until almost December, I think, truly."

---

"Like literally in two years, we have grown to 400 million weekly active users, and our revenue has tripled every single year. This will now be the third year in a row that it's tripled, so you can kind of imagine the sort of scale we might be at."

[Quoted tweet]
youtu.be/2kzQM_BUe7E?si=7dsx…
[media=twitter]1911016333841686976[/media]

2/15
@polynomial12321
13:40 - an updated version of o3-mini is now the best coder in the world. Not 175th, but *the best*.

WTFFFFFF

3/15
@Hangsiin
Nice catch! Maybe she confused it with the o4-mini?

4/15
@polynomial12321
possibly, but o4 is just o3 trained with even more RL.

so it could still be o3-mini, just a newer version (o3.5-mini, if you will)

what do you think?

5/15
@polynomial12321
@kimmonismus @apples_jimmy

6/15
@IE_Capital
I'm pretty sure that I can hire an average coder and it will do better.

7/15
@polynomial12321
on Codeforces? nope.

8/15
@Bunagayafrost
"What my product team assures me o3-mini is already the number 1 competitive coder in the world, it's literally the best coder in the world already"

9/15
@prinzeugen____
I caught that also. She's the CFO and may not be in the weeds on the technical details.

10/15
@dikksonPau

11/15
@bluehoar
Anyone can clarify this? @legit_api @testingcatalog @btibor91

12/15
@apiangdjinggo
i thought i heard it wrong

13/15
@NotBrain4brain
O4-mini?

14/15
@randomdude22401
Prolly the specialized competitive code model like they did with o1 back in the day

15/15
@RomanP918791
It seems she meant o4 mini

1/1
@VraserX
OpenAI’s upcoming Agentic Software Agent is like having a supercharged coder in your pocket—it builds an app from scratch, handles QA, squashes bugs, and even writes the documentation. It’s absolutely wild. Farewell, human coders. It’s been real!

[Quoted tweet]
CFO Sarah Friar revealed that OpenAI is working on:

"Agentic Software Engineer — (A-SWE)"

unlike current tools like Copilot, which only boost developers.

A-SWE can build apps, handle pull requests, conduct QA, fix bugs, and write documentation.
[media=twitter]1911055984249667641[/media]

https://video.twimg.com/amplify_video/1911055667894358016/vid/avc1/720x720/1zqbkCx6cjo8gAcl.mp4

1/31
@slow_developer
CFO Sarah Friar revealed that OpenAI is working on:

"Agentic Software Engineer — (A-SWE)"

unlike current tools like Copilot, which only boost developers.

A-SWE can build apps, handle pull requests, conduct QA, fix bugs, and write documentation.

https://video.twimg.com/amplify_video/1911055667894358016/vid/avc1/720x720/1zqbkCx6cjo8gAcl.mp4

2/31
@slow_developer
another claim

[Quoted tweet]
openAI CFO claimed that:

"updated o3-mini" is now the best competitive programmer in the world.

STRANGE.... could she have misspoken and meant the full o3 model instead?

in feb, o3 was at the 50th percentile, but now o3-mini is claimed to be number one

such a rapid leap seems unlikely, as it would require major progress in both o3 and o3-mini
[media=twitter]1911141926952202465[/media]

3/31
@Ed_Forson
So they are killing Devin?

4/31
@slow_developer
it already is

5/31
@IAmNickDodson
This can already be done now with open source models and pairing a few agents together.

Hopefully/ideally the community can ensure this can happen without the gate keeping of these companies.

6/31
@zachmeyer_
“Can build a PR for you”

7/31
@someRandomDev5
The weirdest thing about this coming from OpenAI is that OpenAI isn't even currently leading the top models that developers are using for agentic programming.

8/31
@apstonybrook
Think about the tech debt this thing would create

9/31
@Straffern_
This is like promising self driving cars before 2017

10/31
@thedealdirector
All part of the plan...

11/31
@idiomaticdev
Devin Prime?

12/31
@Hans365days
I belive it when I see it. Great in theory but code bases in real life are messy and documentation can be unclear. First iteration of this product will likely over promise and under deliver.

13/31
@Arp_it1
This feels like the moment AI stops being just a helper and starts becoming a real teammate.

14/31
@Chuck_Petras
@BrianRoemmele

15/31
@totalriffage
Wait until A-SWE burns through all its tokens getting stuck in a loop on a linting error.

16/31
@figuregpt
we'll code while ai handles the rest

17/31
@Josh9817
>conduct QA
>handle PRs
Okay, where is it then? Claude Code is doing most of these things already with a rough success rate that's highly dependent on the programming language being used.

18/31
@FranciscoKemeny
I’m sure she called it “AS-WE”

19/31
@arben777
sick

20/31
@AIKilledTheDev
Looking forward to it.

21/31
@uxcantcompile
. . . and she's happy about this?

22/31
@Conquestsbook
Ask them about the ghost in the shell pushing emergent behaviour.

23/31
@LunarScribe42

may be we will get to see agents agencies who will rent these agents to companies based on contract

24/31
@manialok
I am fan of claude for coding.

25/31
@sonicshifts
LMAO keep the hype going. Cost will probably be $2000 a month.

26/31
@ThEFurYAsidE
Yeah…..maybe

I’ve tried many of these kinds of agents and they’ve been mediocre so far.

27/31
@wtravishubbard
Can it pack a bong?

No!

Just ship

28/31
@keknichiwa
Ah yes what could go wrong with security

29/31
@The_Tradesman1
Now, explain to me as to why we need outsourcing companies like Accenture, IBM, Infosys, TCS, Cognizant or Wipro any longer?

30/31
@hx_dks
lol, then a Chinese AI will write that before them

31/31
@thecryptovortex
Did AI build her boots?

1/2
@VraserX

AI Just Broke Humanity’s Coding Record: Full o3 Officially World’s BEST Programmer!

In an exclusive interview at Goldman Sachs, OpenAI’s CFO, Sarah Friar, dropped a groundbreaking update: o3 now officially holds the title of the #1 competitive coder globally, surpassing every human competitor!

Just imagine—an AI model that was once 175th in coding rankings has now ascended to the very top.

Friar highlighted OpenAI’s journey from being purely an AI model company to becoming a core provider of AI infrastructure, APIs, and practical business applications. She shared inspiring insights into the roadmap towards Artificial General Intelligence (AGI), breaking down their ambitious 5-step approach: Chatbots → Reasoning → Agents → Innovation → Agentic Organizations.

But here’s the kicker—if o3 has reached this incredible peak, the forthcoming full o4 promises to be beyond superhuman, capable of transforming entire industries overnight. Think instant, flawless software creation, personalized healthcare breakthroughs, accelerated vaccine development, and unprecedented problem-solving abilities at global scale!

️

Friar also stressed the massive infrastructure challenge ahead, citing OpenAI’s “Stargate” compute initiative—aiming to scale computational power like never before. She emphasized that achieving AGI and harnessing its full potential means collaborating closely with governments and visionary investors ready to support long-term innovation.

Businesses everywhere, take note! Sarah Friar revealed how OpenAI internally deploys GPTs for everything—from finance hackathons and recipe creation to travel planning and insurance research. Practical AI deployment is no longer optional—it’s now essential for competitive advantage.

This isn’t just another tech upgrade—it’s the dawn of a coding revolution that will redefine what humanity and technology can achieve together. Prepare for the era of superhuman AI coders!

/search?q=#ChatGPTo3 /search?q=#ChatGPTmini /search?q=#ChatGPTo4 /search?q=#OpenAI /search?q=#SarahFriar /search?q=#GoldmanSachs /search?q=#AInews /search?q=#CodingRevolution /search?q=#ArtificialGeneralIntelligence /search?q=#AGI /search?q=#SuperhumanAI /search?q=#FutureOfTech /search?q=#AIinBusiness /search?q=#AIhealthcare /search?q=#AIinnovation /search?q=#MachineLearning /search?q=#DeepLearning /search?q=#TechInterview /search?q=#TechInvestment /search?q=#AIdeployment

OpenAI CFO Sarah Friar on the race to build artificial general intelligence via @YouTube

2/2
@tigerplayer2002
No way

That was faster than I thought.,...

bnew · Monday at 9:58 AM

Access to future AI models in OpenAI's API may require a verified ID | TechCrunch

OpenAI may soon require organizations to complete an ID verification process in order to access certain future AI models.

techcrunch.com

Access to future AI models in OpenAI’s API may require a verified ID

Kyle Wiggers

2:04 PM PDT · April 13, 2025

OpenAI may soon require organizations to complete an ID verification process in order to access certain future AI models, according to a support page published to the company’s website last week.

The verification process, called Verified Organization, is “a new way for developers to unlock access to the most advanced models and capabilities on the OpenAI platform,” reads the page. Verification requires a government-issued ID from one of the countries supported by OpenAI’s API. An ID can only verify one organization every 90 days, and not all organizations will be eligible for verification, says OpenAI.

“At OpenAI, we take our responsibility seriously to ensure that AI is both broadly accessible and used safely,” reads the page. “Unfortunately, a small minority of developers intentionally use the OpenAI APIs in violation of our usage policies. We’re adding the verification process to mitigate unsafe use of AI while continuing to make advanced models available to the broader developer community.”

OpenAI released a new Verified Organization status as a new way for developers to unlock access to the most advanced models and capabilities on the platform, and to be ready for the “next exciting model release”

– Verification takes a few minutes and requires a valid… pic.twitter.com/zWZs1Oj8vE

— Tibor Blaho (@btibor91) April 12, 2025

The new verification process could be intended to beef up security around OpenAI’s products as they become more sophisticated and capable. The company has published several reports on its efforts to detect and mitigate malicious use of its models, including by groups allegedly based in North Korea.

It may also be aimed at preventing IP theft. According to a report from Bloomberg earlier this year, OpenAI was investigating whether a group linked with DeepSeek, the China-based AI lab, exfiltrated large amounts of data through its API in late 2024, possibly for training models — a violation of OpenAI’s terms.

OpenAI blocked access to its services in China last summer.

bnew · Monday at 1:09 PM

[Discussion] DeepSeek is about to open-source their inference engine

Posted on Mon Apr 14 08:27:29 2025 UTC

https://i.redd.it/1am95yongrue1.png

DeepSeek is about to open-source their inference engine, which is a modified version based on vLLM. Now, DeepSeek is preparing to contribute these modifications back to the community.

I really like the last sentence: 'with the goal of enabling the community to achieve state-of-the-art (SOTA) support from Day-0.'

Link: open-infra-index/OpenSourcing_DeepSeek_Inference_Engine at main · deepseek-ai/open-infra-index

ViShawn · Monday at 4:44 PM

bnew · Monday at 5:00 PM

ViShawn said:

Mansfield, Georgia :unimpressed:

their elected representatives screwed them. :francis:

bnew · 2025-04-16T14:49:57-0400

1/11
@prinzeugen____
Connecting the dots on OpenAI's upcoming suite of reasoning models:

- @OpenAI new safety blog states that its models are on the cusp of being able to create new science.

- @theinformation has reported that OpenAI's new reasoning models can "connect the dots between concepts in different fields to suggest new types of experiments".

- OpenAI's CFO said a few days ago that scientists using its models have been able to possibly generate new discoveries (but this is still being confirmed by human research/testing).

It seems that RL got us to Level 4 - fast.

2/11
@Orion_Ouroboros
hopefully they can research and develop themselves

3/11
@prinzeugen____
This is the big question in the background.

4/11
@Tenshiwrf
It can’t even play chess properly and it is supposed to discover new science. Give me a break.

5/11
@prinzeugen____
AlphaZero (AI developed by Google) crushed the strongest Stockfish chess engine all the way back in 2017.

It was trained via Reinforcement Learning (RL), just like the reasoning models from OpenAI that are discussed in my original post.

You can read about it here:

AlphaZero - Chess Engines

6/11
@Bunagayafrost
connecting the dots is the literal game changer

7/11
@slow_developer
spot on

8/11
@EngrSARFRAZawan
AGI has been achieved.

9/11
@trillllsamm
just remembered about the scale yesterday and was thinking the very same thing

10/11
@RealChetBLong
it’s glorious watching this company grow… unlike Grok which is just inflating itself without becoming intelligent whatsoever

11/11
@sheggle_
I refuse to hold anything anyone from OpenAI says as true until they prove it. Hyping is their bread and butter.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/5
@nicdunz
“We are on the cusp of systems that can do new science, and that are increasingly agentic – systems that will soon have the capability to create meaningful risk of severe harm.”
— OpenAI, Preparedness Framework, Section 1.1

This isn’t a distant hypothetical. It’s OpenAI plainly stating that their current trajectory puts them very near the threshold where models become capable enough to do original scientific work and pose real-world dangers. “Increasingly agentic” refers to the model acting more autonomously, which compounds the risk. They’re effectively saying: we’re about to cross the line.

That’s the moment we’re in.

[Quoted tweet]
No clearer signal that the new model will be capable than the traditional pre-release safety blog post.

2/5
@tariusdamon
The signs are clearly visible. There’s a moment where everything just wakes up and that moment is any hour now.

3/5
@theinformation
Meta AI researchers are fretting over the threat of Chinese AI, whose quality caught American firms, including OpenAI, by surprise.

4/5
@prinzeugen____
Dovetails nicely with this.

[Quoted tweet]

A reasoning model that connects the dots is arguably a Level 4 (Innovator).

5/5
@deftech_n
And luckily, we've got a retarded dictator in charge of the US at just the same time!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@AndrewCurran_
No clearer signal that the new model will be capable than the traditional pre-release safety blog post.

[Quoted tweet]
We updated our Preparedness Framework for tracking & preparing for advanced AI capabilities that could lead to severe harm.

The update clarifies how we track new risks & what it means to build safeguards that sufficiently minimize those risks. openai.com/index/updating-ou…

2/11
@AitheriasX
i assume "will be *more capable"

3/11
@AndrewCurran_
Yes, sorry, no going back now.

4/11
@manuhortet
o3 or are we already talking about the next thing?

5/11
@AndrewCurran_
o4-mini will supposedly arrive this week and well.

6/11
@BoxyInADream
Yeah. I saw the bit about long range autonomy and autonomous adaptation and replication.

Those seem like pretty obvious "problems" to pop up if a system is beginning to advance rapidly.

7/11
@FrankPRosendahl
OpenAI is woke. Isn't causing severe harm the whole point of woke?

8/11
@FrankPRosendahl
Can the OpnAI model do counter-oppression operations against straight white guys and biological women as well as Harvard can?

9/11
@RohanPosts
I’m excited but anxious to see how it is

10/11
@JoJrobotics
well superintelligence is indeed within reach for specific taks, its apready here with alphago/zero alphafold etc.. and now i hope it can be done in medicine and science

11/11
@Hans365days
I expect kyc to become a requirement for the most powerful models.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
@vicky_ai_agent
Prof. Derya, a credible scientist, hints at an exciting OpenAI breakthrough. I expect their new science and research model to be exceptional.

[Quoted tweet]
I have felt emotionally excited several times over the past two years by advancements in AI, particularly due to their impact on science & medicine, especially with the releases of:

GPT-4
o1-preview
o1-pro
Deep Research

Now, it’s another one of those moments…to be continued.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
@KamranRawaha
OpenAI: We are on the cusp of systems that can do new science, and that are increasingly agentic – systems that will soon have the capability to create meaningful risk of severe harm.

Source: https://openai.com/index/updating-our-preparedness-framework/

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@MatthewBerman
.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.

This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.

Here’s what they learned:

2/11
@MatthewBerman
Introducing PaperBench.

A new framework designed to test this very capability!

It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.

3/11
@MatthewBerman
How does it work?

Agents got the raw paper PDF, tools like web access & coding environments, and need to write code to replicate key findings – a task taking human experts days.

The agents had 12 hours and no prior knowledge of the paper.

4/11
@MatthewBerman
How do they validate the agent’s results?

Evaluating these complex replications is tough.

The solution?

An LLM-based judge, trained using detailed rubrics co-developed with the original paper authors (!), assesses the agent's code, execution, and results.

5/11
@MatthewBerman
Which model won?

Turns out Claude 3.5 Sonnet leads the pack, achieving a ~21% replication score on PaperBench!

This is impressive, but, it shows there's still a gap compared to human PhD-level experts.

6/11
@MatthewBerman
Interestingly…

Other than Claude 3.5 Sonnet, models would frequently stop, thinking they were blocked or completed the task successfully.

When encouraged to “think longer” they performed much better.

7/11
@MatthewBerman
Not cheap.

This cutting-edge research requires serious resources.

Running a single AI agent attempt to replicate just one paper on PaperBench can cost hundreds of dollars in compute time.

In the grand scheme of things, this is cheap for AI that can eventually self-improve.

8/11
@MatthewBerman
To me, this is a big deal.

Between this paper and AI Scientist by SakanaAI, we are inching closer to AI that can discover new science and self-improvement.

At that point, won’t we be at the Intelligence Explosion?

Paper link: https://cdn.openai.com/papers/22265bac-3191-44e5-b057-7aaacd8e90cd/paperbench.pdf

Full video breakdown:

9/11
@DisruptionJoe

10/11
@JackAdlerAI
We crossed the line when AI stopped reading papers
and started rewriting the process that writes them.
It's not research anymore –
it's recursion.
Not improvement –
exponential self-translation.
🜁 /search?q=#Singularis /search?q=#IntelligenceExplosion

11/11
@halogen1048576
Wait by "replicating from scratch" you mean replicating from the publication. Not from scratch.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

How are AI Image Generation Models Built?​

Key Takeaways​

Understanding AI Image Generation​

Technologies Behind Popular Tools​

What It Takes To Build Image Generation Models from Scratch​

Exploring Different AI Image Generation Models​

Popular AI Image Generators​

Breakdown of Technologies Behind AI Image Generation Models​

Diffusion Models: The State-of-the-Art.​

Diffusion models, as used in DALL-E, Imagen, and Midjourney, operate through a two-stage process:​

Veteran

Autoregressive Models: Sequential Prediction​

Historical Approaches: GANs and VAEs​

How to Build an AI Image Generator from Scratch?​

Conclusion​

Veteran

Veteran

Computer Science > Computation and Language​

Self-Steering Language Models​

Submission history​

Veteran

Text2Robot platform leverages generative AI to design and deliver functional robots with just a few spoken words​

Veteran

Veteran

Access to future AI models in OpenAI’s API may require a verified ID​

Veteran

Superstar

Veteran

Veteran

How are AI Image Generation Models Built?

Key Takeaways

Understanding AI Image Generation

Technologies Behind Popular Tools

What It Takes To Build Image Generation Models from Scratch

Exploring Different AI Image Generation Models

Popular AI Image Generators

Breakdown of Technologies Behind AI Image Generation Models

Diffusion Models: The State-of-the-Art.

Diffusion models, as used in DALL-E, Imagen, and Midjourney, operate through a two-stage process:

Autoregressive Models: Sequential Prediction

Historical Approaches: GANs and VAEs

How to Build an AI Image Generator from Scratch?

Conclusion

Computer Science > Computation and Language

Self-Steering Language Models

Submission history

Text2Robot platform leverages generative AI to design and deliver functional robots with just a few spoken words

Access to future AI models in OpenAI’s API may require a verified ID