bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186









1/12
@minchoi
Holy sh*t

Meta just revealed Llama 4 models: Behemoth, Maverick & Scout.

Llama 4 Scout can run on single GPU and has 10M context window 🤯



https://video.twimg.com/ext_tw_video/1908628230237573120/pu/vid/avc1/720x1280/P74rnIupiit-c6E0.mp4

2/12
@minchoi
And Llama 4 Maverick just took #2 spot on Arena Leaderboard with 1417 ELO 🤯

[Quoted tweet]
BREAKING: Meta's Llama 4 Maverick just hit #2 overall - becoming the 4th org to break 1400+ on Arena!🔥

Highlights:
- #1 open model, surpassing DeepSeek
- Tied #1 in Hard Prompts, Coding, Math, Creative Writing
- Huge leap over Llama 3 405B: 1268 → 1417
- #5 under style control

Huge congrats to @AIatMeta — and another big win for open-source! 👏 More analysis below⬇️
[media=twitter]1908601011989782976[/media]

Gny1hLebYAUNry8.jpg


3/12
@minchoi
Llama 4 Maverick beats GPT-4o and DeepSeek v3.1 and reportedly cheaper 🤯



GnzURO5WUAASICB.png


4/12
@minchoi
Llama 4 Scout handles 10M tokens, fits on 1 GPU (H100), and crushes long docs, code, and search tasks.



https://video.twimg.com/ext_tw_video/1908634008256139264/pu/vid/avc1/1280x720/v8lyumGxQL3ZzP0t.mp4

5/12
@minchoi
Official announcement

[Quoted tweet]
Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model with 16 experts.
• Industry-leading context window of 10M tokens.
• Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks.

Llama 4 Maverick
• 17B-active-parameter model with 128 experts.
• Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image.
• Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks.
• Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters.
• Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena.

These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight.

Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs
Download Llama 4 ➡️ go.fb.me/bwwhe9
[media=twitter]1908598456144531660[/media]

Gnyz0XFbYAEznnT.jpg


6/12
@minchoi
If you enjoyed this thread,

Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

[Quoted tweet]
Holy sh*t

Meta just revealed Llama 4 models: Behemoth, Maverick & Scout.

Llama 4 Scout can run on single GPU and has 10M context window 🤯
[media=twitter]1908629170717966629[/media]

https://video.twimg.com/ext_tw_video/1908628230237573120/pu/vid/avc1/720x1280/P74rnIupiit-c6E0.mp4

7/12
@WilderWorld
WILD



8/12
@minchoi
It's getting wild out there



9/12
@AdamJHumphreys
I was always frustrated with the context window limitations of @ChatGPTapp. Apparently Grok/Gemini is much higher that ChatGPT by a significant factor in context window.



10/12
@minchoi
Yes it's true. Now Llama 4 just topped them with 10M



11/12
@tgreen2241
If he thinks it will outperform o3 or o4 mini, he's sorely mistaken.



12/12
@minchoi
Did you mean Llama 4 Reasoning?




















1/16
@omarsar0
Llama 4 is here!

- Llama 4 Scout & Maverick are up for download
- Llama 4 Behemoth (preview)
- Advanced problem solving & multilingual
- Support long context up to 10M tokens
- Great for multimodal apps & agents
- Image grounding
- Top performance at the lowest cost
- Can be served within $0.19-$0.49/M tokens



Gny0OJkXUAAnAWF.jpg


2/16
@omarsar0
LMArena ELO score vs. cost

"To deliver a user experience with a decode latency of 30ms for each token after a one-time 350ms prefill latency, we estimate that the model can be served within a range of $0.19-$0.49 per million tokens (3:1 blend)"



Gny1trRXEAAMQHq.jpg


3/16
@omarsar0
It's great to see native multimodal support for Llama 4.



Gny27LNWUAEVt9S.jpg


4/16
@omarsar0
Llama 4 Scout is a 17B active parameter model with 16 experts and fits in a single H100 GPU.

Llama 4 Maverick is a 17B active parameter model with 128 experts. The best multimodal model in its class, beating GPT-4o & Gemini 2.0 Flash on several benchmarks.



Gny3SAuWYAAPvds.jpg


5/16
@omarsar0
Those models were distilled from Llama 4 Behemoth, a 288B active parameter model with 16 experts.

Behemoth is their most powerful model in the series. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks.



Gny4NZzXYAAqA48.jpg


6/16
@omarsar0
Llama 4 seems to be the first model from Meta to use a mixture of experts (MoE) architecture.

This makes it possible to run models like Llama 4 Maverick on a single H100 DGX host for easy deployment.



Gny4tEiXUAAhIHJ.jpg


7/16
@omarsar0
Claims Llama 4 Maverick achieves comparable results to DeepSeek v3 on reasoning and coding, at half the active parameters.



Gny6lZ_WgAAvw5_.jpg


8/16
@omarsar0
The long context support is gonna be huge for devs building agents.

There is more coming, too!

Llama 4 Reasoning is already cooking!



https://video.twimg.com/ext_tw_video/1908606494527893504/pu/vid/avc1/1280x720/8gb5oYcDl093QmYm.mp4

9/16
@omarsar0
Download the Llama 4 Scout and Llama 4 Maverick models today on Llama and Hugging Face.

Llama 4 (via Meta AI) is also available to use in WhatsApp, Messenger, Instagram Direct, and on the web.



Gny8WKLWgAAxnHq.jpg


10/16
@omarsar0
HF models: meta-llama (Meta Llama)

Great guide on Llama 4 is here: Llama 4 | Model Cards and Prompt formats

Detailed blog: The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation



Gny93haXkAAGNpV.jpg


11/16
@omarsar0
The model backbone seems to use early fusion to integrate text, image, and video tokens.

Post-training pipeline: lightweight SFT → online RL → lightweight DPO.

They state that the overuse of SFT/DPO can over-constrain the model and limit exploration during online RL and suggest keeping it light instead.



GnzAPabW0AA5Z5H.jpg


12/16
@omarsar0
It seems to be available on Fireworks AI APIs already:

[Quoted tweet]
🔥🦙 llama4 launch on @FireworksAI_HQ ! 🦙🔥

Llama4 has just set a new record—not only among open models but across all models. We’re thrilled to be a launch partner with @Meta to provide easy API access to a herd of next-level intelligence!

The herd of models launched are in a class of their own, offering a unique combination of multi-modality and long-context capabilities (up to 10 million tokens!). We expect a lot of active agent development to experiment and go to production with this new set of models.

Our initial rollout includes both Scout and Maverick models, with further optimizations and enhanced developer toolchains launching soon.

You can access the model APIs below, and we can't wait to see what you build!

🔥 llama4- scout: fireworks.ai/models/firework…
🔥 llama4 - maverick: fireworks.ai/models/firework…
[media=twitter]1908610306924044507[/media]


13/16
@omarsar0
Besides the shift to MoE and native multimodal support, how they aim to support "infinite" context length is a bit interesting.

More from their long context lead here:

[Quoted tweet]
Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates!

🚀Llama 4 Scout
🔹17B active params · 16 experts · 109B total params
🔹Fits on a single H100 GPU with Int4
🔹Industry-leading 10M+ multimodal context length enables personalization, reasoning over massive codebases, and even remembering your day in video

🚀Llama 4 Maverick
🔹17B active params · 128 experts · 400B total params · 1M+ context length
🔹Experimental chat version scores ELO 1417 (Rank #2) on LMArena

🚀Llama 4 Behemoth (in training)
🔹288B active params · 16 experts · 2T total params
🔹Pretraining (FP8) with 30T multimodal tokens across 32K GPUs
🔹Serves as the teacher model for Maverick codistillation

🚀All models use early fusion to seamlessly integrate text, image, and video tokens into a unified model backbone.
🚀Our post-training pipeline: lightweight SFT → online RL → lightweight DPO. Overuse of SFT/DPO can over-constrain the model and limit exploration during online RL—keep it light.

💡Solving long context by aiming for infinite context helps guide better architectures.
We can't train on infinite-length sequences—so framing it as an infinite context problem narrows the solution space, especially via length extrapolation: train on short, generalize to much longer.

Enter the iRoPE architecture (“i” = interleaved layers, infinite):
🔹Local parallellizable chunked attention with RoPE models short contexts only (e.g., 8K)
🔹Only global attention layers model long context (e.g., >8K) without position embeddings—improving extrapolation. Our max training length: 256K.
🔹As context increases, attention weights flatten—making inference harder. To compensate, we apply inference-time temperature scaling at global layers to enhance long-range reasoning while preserving short-context (e.g., α=8K) performance:

xq *= 1 + log(floor(i / α) + 1) * β # i = position index

We believe in open research. We'll share more technical details very soon—via podcasts. Stay tuned!
[media=twitter]1908595612372885832[/media]

GnzC9InWYAAldoP.png

GnyxN4pbYAA7lfC.png

GnyxN5WbYAAgFmR.jpg

GnyxN5TbwAAbd3Z.jpg

GnyxN6gbYAMAJJU.jpg


14/16
@omarsar0
Licensing limitations: If over 700M monthly active users, you need to request a special license.

[Quoted tweet]
Llama 4's new license comes with several limitations:

- Companies with more than 700 million monthly active users must request a special license from Meta, which Meta can grant or deny at its sole discretion.

- You must prominently display "Built with Llama" on websites, interfaces, documentation, etc.

- Any AI model you create using Llama Materials must include "Llama" at the beginning of its name

- You must include the specific attribution notice in a "Notice" text file with any distribution

- Your use must comply with Meta's separate Acceptable Use Policy (referenced at llama.com/llama4/use-policy)

- Limited license to use "Llama" name only for compliance with the branding requirements
[media=twitter]1908602756182745506[/media]


15/16
@omarsar0
This 2 trillion total parameter model (Behemoth) is a game-changer for Meta.

They had to revamp their underlying RL infrastructure due to the scale.

They're now positioned to unlock insane performance jumps and capabilities for agents and reasoning going forward. Big moves!



GnzIV_sX0AAqd9s.jpg


16/16
@omarsar0
I expected nothing less. It's great to see Meta become the 4th org to break that 1400 (# 2 overall) on the Arena.

What comes next, as I said above, is nothing to ignore. Open-source AI is going to reach new heights that will break things.

OpenAI understands this well.



GnzLFtRXEAA_gti.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186









1/9
@boundlessanurag
Generative AI is making biotech research efficient.

Human creativity, trial-and-error experimentation, and painstaking iteration have powered biological research for ages. Advances in enzyme engineering and drug discovery needed wet lab labor.



2/9
@boundlessanurag
AI is set to fill the void and accelerate biotech research.  Generative AI is disrupting software development and biological sciences by redesigning proteins, manipulating enzymes, and accurately anticipating molecular interactions.



3/9
@boundlessanurag
AI is replacing laborious experiments in biological research with simulations. AI can anticipate and design proteins, enabling advancements in enzyme engineering, antibody treatments, and small-molecule medicines.



4/9
@boundlessanurag
Following DeepMind's AlphaFold-2 and AlphaFold-3, Meta's ESMFold, which predicts protein structures 60x faster than previous approaches, to startups like Latent Labs, which will create AI-based, patient-specific medicinal compounds, these companies  are shaping this transition.



5/9
@boundlessanurag
The Institute for Protein Design created 10,000 enzyme prototypes using RFDiffusion, an AI tool. They used AlphaFold2 and PLACER to choose the top choices. They found enzymes that could repeatedly catalyze reactions, a first for complex proteins, after numerous rounds of testing.



6/9
@boundlessanurag
NVIDIA and Arc Institute's Evo 2, the largest open-source biological AI model, advances biological AI. Evo 2 was trained on 9.3 trillion nucleotides from over 128,000 entire genomes, a breakthrough in generative biology and AI-driven genomics.



7/9
@boundlessanurag
Evo 2 can discover disease-causing mutations, including BRCA1 gene mutations, with over 90% accuracy.



8/9
@boundlessanurag
Could AI capabilities such as quick data processing and the ability to derive hidden insights offer up new paths in longevity research and the discovery of treatments for chronic conditions?



9/9
@boundlessanurag
We feel it is realistic and will come to pass soon.
#artificialintelligence #biotechresearch #enzymestructure #drugdiscovery #Evo2 #generativeAI




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@Doghablaespanol
꧁𓊈𒆜DeSci𒆜꧂ ꧁𓊈𒆜DesAI𒆜꧂ ꧁𓊈𒆜Market Updates𒆜𓊉꧂
📅 2025-02-25

✔️ Biggest AI Model in Biology: Evo 2
🧬 Evo 2, co-developed by Arc Institute, @Stanford, and @NVIDIA, is the largest AI model for biology, trained on 128,000 genomes across species.
🔍 Capabilities:

Can write whole chromosomes and small genomes from scratch.

Helps analyze non-coding DNA linked to diseases.

Predicts effects of mutations, including those in BRCA1 (linked to breast cancer).

⚡ Key Features:

State-of-the-art genome analysis with 9.3 trillion DNA letters.

Processes long-range DNA interactions up to 1 million base pairs apart.

Can assist in CRISPR and gene-editing innovations.

🚀 Breakthrough for Genomics & Medicine:

Enhances disease research, personalized medicine, and synthetic biology.

Supports deeper insights into regulatory DNA for biotech advancements.

Source;; Redirect Notice

#AI #Genomics #Biotech #CRISPR #SyntheticBiology #Evo2

🌈⃤ 🅐🅛🅔🅧 #🅓🅔🅟🅘🅝 #🅓🅔🅢🅒🅘 #🅓🅔🅢🅐🅘 #D̳̿͟͞e̳̿͟͞S̳̿͟͞c̳̿͟͞i̳̿͟͞C̳̿͟͞u̳̿͟͞l̳̿͟͞t̳̿͟͞




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196













1/11
@medvolt_ai_
🧵 AI-Powered Bioengineering with Evo 2 – A New Era Begins 🚀



Gki-jb0W4AEhC6q.png


2/11
@medvolt_ai_
1️⃣ Synthetic biology has long aimed to design biological systems with engineering precision. Yet, nature’s complexity has made this difficult—until now. AI is changing the game. 🔬



3/11
@medvolt_ai_
2️⃣ Meet Evo 2, the latest AI model from Arc Institute & NVIDIA. Trained on 128,000 genomes, it can predict mutations, generate DNA sequences, and model long-range genetic interactions. 🧬



Gki-35kWgAA3ouQ.jpg


4/11
@medvolt_ai_
3️⃣ Evo 2’s 40B parameters and extended context window allow it to analyze entire genes, regulatory regions, and chromatin interactions—essential for understanding genome function.



5/11
@medvolt_ai_
4️⃣ Key capabilities of Evo 2:
✅ Predicts pathogenic mutations in seconds 🦠
✅ Generates new DNA sequences (small bacterial genomes, yeast chromosomes) 🔬
✅ Identifies long-range genome interactions—critical for regulation & disease insights 🧠



6/11
@medvolt_ai_
5️⃣ Evo 2 doesn’t just memorize data—it has learned fundamental biological concepts like:
🔹 Protein structures 🏗️
🔹 Viral DNA signatures 🦠
🔹 Gene regulation & chromatin accessibility 🧬



7/11
@medvolt_ai_
6️⃣ Why does this matter? AI-driven genome design could revolutionize:
🔹 Synthetic biology 🏭
🔹 Genetic therapies 🏥
🔹 Biofuels & biomanufacturing 🌱
🔹 Drug discovery & precision medicine 💊



8/11
@medvolt_ai_
7️⃣ This breakthrough parallels AlphaFold's impact on protein folding—Evo 2 makes bioengineering more predictable, scalable, and accessible than ever before.



Gki_IKpWAAAzaDD.jpg


9/11
@medvolt_ai_
8️⃣ Evo 2 is open-source and available for researchers via API. Could this be the beginning of AI-driven genetic design? The future of synthetic biology is here.

/search?q=#AI /search?q=#SyntheticBiology /search?q=#Genomics /search?q=#DrugDiscovery /search?q=#Bioengineering



10/11
@medvolt_ai_
At Medvolt, we harness the power of generative AI, alongside other large language models (LLMs) and deep learning technologies, through our innovative platform 𝐌𝐞𝐝𝐆𝐫𝐚𝐩𝐡.



11/11
@medvolt_ai_
𝐅𝐞𝐞𝐥 𝐟𝐫𝐞𝐞 𝐭𝐨 𝐜𝐨𝐧𝐭𝐚𝐜𝐭 𝐮𝐬 𝐢𝐟 𝐲𝐨𝐮 𝐡𝐚𝐯𝐞 𝐚𝐧𝐲 𝐢𝐧𝐪𝐮𝐢𝐫𝐢𝐞𝐬 𝐨𝐫 𝐫𝐞𝐪𝐮𝐢𝐫𝐞 𝐚 𝐝𝐞𝐦𝐨𝐧𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧.

Visit our website: Medvolt | AI Platform for Drug Discovery and Repurposing or reach out to us via email: contact@medvolt.ai




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/21
@arcinstitute
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. Manuscript | Arc Institute



GkKdpdoaAAA5u1d.jpg


2/21
@arcinstitute
Trained on 9.3 trillion nucleotides from over 128,000 archaeal, prokaryotic, and eukaryotic genomes, Evo 2 brings the power of large language models to biology, enabling new discoveries in bioengineering and medicine. AI can now model and design the genetic code for all domains of life with Evo 2 | Arc Institute



3/21
@arcinstitute
A collaboration between @arcinstitute, @NVIDIAHealth, @Stanford, @UCBerkeley, and @UCSF, Evo 2 is fully open source - including training data, code, and model weights, now on the @nvidia BioNeMo platform. evo2-40b Model by Arc | NVIDIA NIM



GkKeWppXwAEJZLv.jpg


4/21
@arcinstitute
To explore the model, try the user-friendly interface Evo Designer to generate DNA by sequence, species, and more: Evo 2: DNA Foundation Model | Arc Institute



5/21
@arcinstitute
Arc Institute also worked with AI research lab @GoodfireAI to develop a mechanistic interpretability visualizer that uncovers the key biological features and patterns the model learns to recognize in genomic sequences: Evo 2: DNA Foundation Model | Arc Institute



GkKe6Hxb0AA_tLw.jpg


6/21
@WilliamLamkin
Awesome research! Congratulations to all the collaborators



7/21
@vedangvatsa
Hidden Gems in Evo 2's paper

[Quoted tweet]
🧵 Hidden Gems in Evo 2’s Paper

A groundbreaking biological foundation model trained on 9.3 trillion DNA base pairs.
[media=twitter]1892300017005650411[/media]

GkLHWmObUAI9uom.png


8/21
@Molecule_dao
This is massive



9/21
@shae_mcl
Enormous fan of this work - I’m curious how Evo 2 performs on some of the standard genome understanding evaluations like GUE? It would be great to compare all the existing sota models on a common set of downstream tasks



10/21
@is_OwenLewis
Massive news! Biology just took a giant leap forward.



11/21
@BondeKirk
This seems like it lowers the bar for creating novel pathogens. I'd be grateful to understand the reasoning behind the decision to open-source this.

Since ARC is filled with smart and thoughtful people, surely the dual-use nature of this AI model crossed your mind?



12/21
@AllThingsApx
A fun gold nugget from the ablation experiments in the appendix:

Focusing pretraining on functional regions (rather than whole genomes) improved predictive performance, especially for noncoding variants!

Cool stuff.



13/21
@oboelabs
it's wild how much genetic code and computer code have in common! ai is now being used to debug both



14/21
@bobbysworld69
Will it cure cancer



15/21
@Hyperstackcloud
Incredible work team! 👏



16/21
@TheOriginalNid
Weird, my name isn’t on there.



17/21
@Math_MntnrHZ
Evo 2’s precision in genetic code design could accelerate biotech innovations, redefining genome engineering. A pivotal moment for life sciences!



18/21
@smdxit
So many tools this day, so little time for experimenting 😍😭



19/21
@Digz_0
YUGE



20/21
@karmicoder
I always wanted to understand life and this is the perfect thing to get deeper into it. 😍



21/21
@Moozziii
😲




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186

1/25
@BrianRoemmele
This AI can write genomes from scratch.

The Arc Institute with NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

This AI doesn’t just analyze genomes. it creates them.

Link: Manuscript | Arc Institute



GkLcHHnbUAALD_e.jpg


2/25
@MarceauFetiveau
The AI revolution in medicine is happening right now.

Diseases we once thought impossible to beat? Their days are numbered.

Get ready to live longer, healthier, and witness the future of medicine unfold.



3/25
@Birdmeister17
I had Grok analyze and summarize this 65 page document so normal people can understand it. We are living in the future. https://grok.com/share/bGVnYWN5_76007630-a88b-4fb8-a71e-12dd4faafc61



4/25
@snarkyslang
This may be a dumb question, but creating genomes from nothing is essentially designing life that doesn't presently exist, is that right?



5/25
@EagleTurd
That's scary actually.



6/25
@NaturallyDragon
Bring back Dragons!



https://video.twimg.com/ext_tw_video/1892322995575934981/pu/vid/avc1/1920x1080/GJ2tuPGw6OXmpNhy.mp4

7/25
@mark74181442
Soon, we will be able to 3-d print it too.



8/25
@Ronan_A1
This is so wild…. The exponential change is mind boggling

[Quoted tweet]
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/
[media=twitter]1892251343881937090[/media]

GkKeXc4XgAAC3pg.jpg


9/25
@TheLastDon222
And this is where things get scary. I think we may be entering into a tower of Babel situation.



10/25
@atlasatoshi
🤯



11/25
@smbodie3
Scary.



12/25
@neilarmstrongx1
this week is wild, we are all of a sudden in my childhood science fiction distant future. I wouldn't be surprised if Buck Rogers showed up 😆



13/25
@pilot_winds
@robertsepehr you might find this useful.



14/25
@ChrisandOla
Is this a good thing?



15/25
@waldematrix
Great😑🥺 now they are really going to make cat girls and Werewolves.



16/25
@calledtocnstrct
Now if a cell simulator can be built to model what happens when the 🧬 is controlling things... Would be interesting to see what creature develops.



17/25
@CPtte
Wow!



18/25
@CPtte
Just the models of them or are we talking create them in physical form? As an actual strand of DNA?



19/25
@Dusty45Cal
I might read this to my kids tonight.



20/25
@fwchiro
When do I get an uplifted dog that can help with the chores?



21/25
@AltVRat
Plus crisper and 💥 Singularity Escape Velocity



22/25
@Mike___Kilo
You sure that’s a good idea? Just because we can, doesn’t mean we should.



23/25
@D0cCoV
Let it analyze saescov2 genome!



24/25
@MonaLiesbeth
I really don’t think this is a good idea. 😳



25/25
@DaneFreshie
The greater familiarity with biology one has, the more concerning this is.

cks-for-posting-the-coli-megathread.984734/post-52211196[/URL][/SIZE][/B][/COLOR]





















1/34
@IterIntellectus
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/



GkKeXc4XgAAC3pg.jpg


2/34
@IterIntellectus
Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome-scale

Evo 2 is FULLY OPEN, including model parameters, training code, inference code, and
the OpenGenome2 dataset LMFAO

2/



GkKgd9FWkAEbirb.jpg


3/34
@IterIntellectus
think of it as a DNA-focused LLM. instead of text, it generates genomic sequences. it reads and interprets complex DNA, including noncoding regions usually considered jink, generates entire chromosomes and new genomes, and predicts disease-causing mutations, even those not understood

3/



4/34
@IterIntellectus
this is biology hacking
AI is moving beyond describing biology to designing it. this allows for synthetic life engineered from scratch, programmable genomes optimized by AI, potential new gene therapies, and lays the groundwork for whole-cell simulation.
biology is becoming a computational discipline

4/



5/34
@IterIntellectus
it was trained on a dataset of 9.3 trillion base pairs from bacteria, archaea, eukaryotes, and bacteriophages
it processes up to 1 million base pairs in a single context window, covering entire chromosomes. it identifies evolutionary patterns previously unseen by humans

5/



6/34
@IterIntellectus
Evo-2 has demonstrated practical generation abilities, creating synthetic yeast chromosomes, mitochondrial genomes, and minimal bacterial genomes.
this is computational design in action.

6/



GkKhcA0XoAAS-Zg.jpg


7/34
@IterIntellectus
Evo-2 understands noncoding DNA, which regulates gene expression and is involved in many genetic diseases.
it predicts the functional impact of mutations in these regions, achieving state-of-the-art performance on noncoding variant pathogenicity and BRCA1 variant classification.
this could lead to advances in precision medicine

7/



8/34
@IterIntellectus
Evo-2 uses stripedhyena 2, combining convolution and attention mechanisms, not transformers.
it models DNA at multiple scales, capturing long-range interactions, and autonomously learns features like exon-intron boundaries and transcription factor binding sites without human guidance.
it’s not just memorizing
it’s understanding biology.

8/



GkKiHsQXYAAIk4k.jpg


9/34
@IterIntellectus
Evo-2 predicts whether mutations are harmful or benign without specific training on human disease data.

WHAT

it outperforms specialized models on BRCA1 variants and handles noncoding mutations effectively, suggesting it has learned DNA’s fundamental principles

9/



10/34
@IterIntellectus
Evo-2 generates DNA sequences that influence chromatin accessibility, controlling gene expression.

it has embedded simple Morse code into epigenomic designs as a proof of concept, not a practical application. this shows potential for designing programmable gene circuits.

make me blonde. thank you

10/



11/34
@IterIntellectus
Evo-2 is FULLY OPEN SOURCE, including model parameters, training data, and code.
this will lead to massive widespread innovation in bioengineering, lowering barriers to genome design.
it’s a revolution moment for the field.

the era of biotech is here

11/



12/34
@IterIntellectus
The Arc Institute aims to model entire cells, moving beyond DNA to whole organisms.

this could lead to AI creating new life forms and synthetic biology becoming AI-driven.

the future involves programming life at increasing scales

12/



13/34
@IterIntellectus
three years ago, AI focused on chatbots.
now it generates genomes. soon, it will design complex biological systems. this is a new phase
humans are no longer just studying biology but rewriting its code.

biology’s future is computational. are you prepared?

13/



14/34
@IterIntellectus
Manuscript | Arc Institute



15/34
@boneGPT
if it's possible to use AI to create every possible genome in the latent space of genes, there is a case that humans could have been discovered galaxies away.

Any sufficiently advanced species would know about man long before ever making contact with earth.



16/34
@IterIntellectus
soon



17/34
@kaiotei_
This is insane and a true breakthrough sheesh. just last week i finally had the mind to feed my genome into chatgpt and have it run dozens of analyzations on my SNPs, genomics and AI is big. This is kind of terrifying to me though because I just imagine it creating super extra covid



18/34
@IterIntellectus
or super soldiers lesssgooooo



19/34
@NickADobos
How soon can I can I CRISPR myself with this?



20/34
@IterIntellectus
realistically speaking, safely, 10-15 years
discard safety, 5



21/34
@parakeetnebula
WE LOVE TO SEE IT



22/34
@IterIntellectus
HELL YEAH



23/34
@thatsallfrens
I WANT A GNOME FROM SCRAC



24/34
@IterIntellectus
I WANT BIG PP



25/34
@AISafetyMemes
*taps sign*

[Quoted tweet]
MIT professor and CRISPr gene drive inventor is sounding the alarm:

A 90% lethal virus that infects 50% of humans in 100 days is now possible.

Vaccines will be far too slow.

Extinction cult Aum Shinrikyo went to Africa to produce purified ebola, but, lucky for us, they failed:

Kevin Esvelt: “They bought a uranium mine, they started developing chemical weapons, they started looking for biological weapons. And while there weren’t very many that they had access to at the time, they were able to produce botulinum toxin and they tried to make enough anthrax.

The leader of their bioweapons programme, when they went to Africa, he was hoping that they would find someone who was infected with Ebola so that he could purify the virus and spread it around, so that it would hopefully transmit and kill as many people as possible.”

Omicron infected 50% of Europe in 100 days - vaccines will be much too slow: “Now, imagine something that was released across multiple airports to start with, and you can see how the moonshot vaccine initiatives that hope to get a new vaccine working and approved in 100 days are still going to be much too slow."

90% lethality is possible: “Rabbit calicivirus — it’s more than 90% lethal in adult rabbits. If nature can do that in an animal, that means it’s possible.”

It could spread far faster than natural pandemics because we have air travel.

A new RAND report said: “Previous attempts to weaponise biological agents, such as an attempt by the Japanese Aum Shinrikyo cult to use botulinum toxin in the 1990s, had failed because of a lack of understanding of the bacterium. AI could “swiftly bridge such knowledge gaps”

Bioweapons experts think we might be 1-3 years away from AI-assisted large-scale biological attacks that bring society to its knees.

Open source AI means irreversible proliferation to the Aum Shinrikyos of the world. We’re giving them weapons we can never take back.
[media=twitter]1714384953696211345[/media]

F8ptBmeWsAAQdE-.jpg


26/34
@AsycLoL
Wtf



27/34
@zeee_media
@threadreaderapp unroll



28/34
@vvdecay
@SorosBruv



29/34
@NI_Intern
Does this mean we're getting Jurassic Park



30/34
@theshadow27
I’m not normally a doomer but this one gives me pause.

An open source version of this means it’s possible to tell it to mix anthrax with Covid. Or worse. CRiSPeR can print any sequence.

What have we done? 😳



31/34
@biotech_bro
Genuinely not overstating how important and transformative it is. You can synthesize organisms that breakdown plastics, and bio-synthetically generate key products and reagents (e.g., Ginkgo on steroids) and maybe even can help terraform planets!!!



32/34
@_anoneng
YAY! Now we can have "life forms" whose genetics are the equivalent of chatgpt slop that barely qualifies as english.

[Quoted tweet]
HOLY shyt IT'S HAPPENING

AI can now write genomes from scratch.
Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life.

it doesn’t just analyze genomes. it creates them
1/
[media=twitter]1892251343881937090[/media]

GkKeXc4XgAAC3pg.jpg


33/34
@craigh64
Now we just need it to design some sort of "perfect organism" that we can use as a weapon!



GkM-5BtXgAA3eY9.png


34/34
@8_O_B




GkLRwWjW0AAAArF.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186














1/16
@buildthatidea
This was one of those weeks where decades happened

- Grok 3 goes public
- Perplexity R1-1776
- OpenAI SWE-Lancer
- Google AI co-scientist
- Nvidia and Arc's Evo 2
- Microsoft’s quantum breakthrough

Here's what you need to know 🧵



2/16
@buildthatidea
1/ We shipped Build That Idea

It lets anyone launch AI agents in 60 seconds and monetize them

so easy your mum can do it ❤️

Sign up here: BuildThatIdea - Launch GPT Wrappers in 60 Seconds!

[Quoted tweet]
Introducing Build That Idea

A platform that lets anyone launch their own AI agent in 60 seconds

- Define your Agent
- Choose a base LLM (OpenAI, Claude, DeepSeek, etc.)
- Upload knowledge base
- Set pricing and start making money

Join the waitlist below


https://video.twimg.com/ext_tw_video/1892956745795776512/pu/vid/avc1/720x728/Peu6tj9FT0ofOguD.mp4

3/16
@buildthatidea
2/ xAI team dropped the world’s smartest AI on earth

- Trained on 10x more compute than Grok-2
- Ranks #1 on Chatbot Arena.
- Outperforms the top reasoning models from Google and OpenAI

and they built it in just 19 months. Insane 🫡

[Quoted tweet]
This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt).

Try Grok 3 now: nitter.poast.org/i/grok

X Premium+ and SuperGrok users will have increased access to Grok 3, in addition to early access to advanced features like Voice Mode


https://video.twimg.com/ext_tw_video/1892399262706913282/pu/vid/avc1/1156x720/_3eBbHGohdrajxiX.mp4

4/16
@buildthatidea
3/ OpenAI introduced SWE-Lancer, a benchmark that tests AI models performance on freelance jobs from Upwork.

It includes 1,400 tasks worth over $1m in economic value

They found that Claude 3.5 is better at coding than OpenAI's own GPT-4o and o1 😂

[Quoted tweet]
Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. openai.com/index/swe-lancer/


5/16
@buildthatidea
4/ Google released AI Co-Scientist, a multi-agent AI system built with Gemini 2.0

It can generate hypotheses, validate research, and accelerate scientific discoveries in medicine, chemistry, and genetics.

It has already discovered a new drug for blood cancer.

https://nitter.poast.org/GoogleAI/status/1892214154372518031



6/16
@buildthatidea
5/ Microsoft made a big breakthrough in quantum computing with its new Majorana 1 chip.

This new chip uses a topological qubit design, bringing us closer to scalable quantum computing with millions of qubits on a single chip.



https://video.twimg.com/amplify_video/1892245060369960960/vid/avc1/1920x1080/hITyM4HiQg8hooNb.mp4

7/16
@buildthatidea
6/ Microsoft Research released BioEmu-1, a deep learning model for predicting protein folding and movement at scale.

It can generate protein structures 100,000x faster than traditional simulations

The best part? It's free for researchers worldwide via Azure AI Foundry Labs.

https://nitter.poast.org/MSFTResearch/status/1892597609769918637



8/16
@buildthatidea
7/ Arc Institute and NVIDIA dropped the world's largest AI model for biology

It can predict harmful mutations, design synthetic genomes, and understand the fundamental code of life

and it’s fully open-source

[Quoted tweet]
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. arcinstitute.org/manuscripts…


GkKdpdoaAAA5u1d.jpg


9/16
@buildthatidea
8/ Perplexity released R1 1776 to remove Chinese Communist Party censorship.

It's a version of DeepSeek R1 that provides uncensored and unbiased answers.

Available via sonar api

[Quoted tweet]
Today we're open-sourcing R1 1776—a version of the DeepSeek R1 model that has been post-trained to provide uncensored, unbiased, and factual information.


https://video.twimg.com/amplify_video/1891916471498133504/vid/avc1/1280x720/07DHKn6Y6-SNDcjv.mp4

10/16
@buildthatidea
9/ Microsoft introduced Muse AI, a new tool that helps game developers create new gameplay ideas and bring old games back to life

It was trained on over seven years of continuous gameplay data and is now open-sourced for developers.

[Quoted tweet]
If you thought AI-generated text, images, and video were cool, just imagine entire interactive environments like games!


https://video.twimg.com/ext_tw_video/1892243717056315392/pu/vid/avc1/1280x720/W8hp4ZUIc-k8RBJL.mp4

11/16
@buildthatidea
10/ Mistral released Saba, a language model designed for Middle Eastern and South Asian regions.

It supports Arabic, Tamil, and Malayalam and is optimized for conversational AI and culturally relevant content.

Available via API

[Quoted tweet]
🏟️Announcing @MistralAI Saba, our first regional language model.
- Mistral Saba is a 24B parameter model trained on meticulously curated datasets from across the Middle East and South Asia.
- Mistral Saba supports Arabic and many Indian-origin languages, and is particularly strong in South Indian-origin languages such as Tamil and Malayalam.


Gj_pZTyXIAAQrEW.png


12/16
@buildthatidea
11/ Convergence introduced the world's most capable web-browsing agent

It can:
- Type, scroll, and click on websites autonomously
- Manage software via web-based interfaces
- Automate workflows with scheduled tasks

[Quoted tweet]
Introducing Proxy 1.0 - the world's most capable web-browsing agent.


https://video.twimg.com/ext_tw_video/1892115303347138560/pu/vid/avc1/1280x720/zhMbFGWh2iq6_M7t.mp4

13/16
@buildthatidea
12/ Pika launched PikaSwaps, an AI-powered video editing tool

It lets you swap people and objects in any video using AI

The results are so realistic that you can’t even tell they were edited



https://video.twimg.com/ext_tw_video/1892999984649265152/pu/vid/avc1/1280x720/4DgfyiZ0hIiVKQZZ.mp4

14/16
@buildthatidea
That's a wrap! Hope you enjoyed it

If you found this thread valuable:

1. Follow @0xmetaschool for more
2. Drop a like or retweet on the first tweet of this thread

[Quoted tweet]
This was one of those weeks where decades happened

- Grok 3 goes public
- Perplexity R1-1776
- OpenAI SWE-Lancer
- Google AI co-scientist
- Nvidia and Arc's Evo 2
- Microsoft’s quantum breakthrough

Here's what you need to know 🧵


15/16
@CharlesHL
ww @readwise save thread



16/16
@almemater
Since this is happening,
would love to see a ‘Debug Web3 marketing’ thread for all the web3 newbies.

@fatimarizwan how about a collaboration for clear skin?🎀




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186

How are AI Image Generation Models Built?​


Learn about AI Image Generation Models, how they work, and how are they built from scratch.

Rajat DangiMarch 28, 2025 · 9 min read

How are AI Image Generation Models Built?


Turn your images into Studio Ghibli style anime art using Animify Image Generator.

public



Key Takeaways​


  • AI image generation models, like those behind ChatGPT 4o and DALL-E, Google Gemini, Grok, and Midjourney, are built using advanced machine learning techniques, primarily diffusion models, with Grok using a unique autoregressive approach.
  • These models require vast datasets of images and text, powerful computing resources like GPUs, and expertise in machine learning and computer vision.
  • Building one from scratch involves collecting data, designing model architectures, and training them, which is resource-intensive and complex.



A candid paparazzi-style photo of Karl Marx hurriedly walking through the parking lot of the Mall of America, glancing over his shoulder with a startled expression as he tries to avoid being photographed. He’s clutching multiple glossy shopping bags filled with luxury goods. His coat flutters behind him in the wind, and one of the bags is swinging as if he’s mid-stride. Blurred background with cars and a glowing mall entrance to emphasize motion. Flash glare from the camera partially overexposes the image, giving it a chaotic, tabloid feel. - ChatGPT 4o


Understanding AI Image Generation​


AI image generation has transformed how we create visual content, enabling tools like ChatGPT 4o, OpenAI DALL-E, Imagen by Google, Aurora by xAI, and Midjourney to produce photorealistic or artistic images from text descriptions. These models are at the heart of popular platforms, making it essential to understand their construction for both technical enthusiasts and out of curiousity.



Technologies Behind Popular Tools​



What It Takes To Build Image Generation Models from Scratch​


Creating an AI image generator involves:

  • Data Needs: Millions of image-text pairs, like those used for DALL-E, ensuring diversity for broad concept coverage.
  • Compute Power: Requires GPUs or TPUs for training, with costs in thousands of GPU hours.
  • Expertise: Knowledge in machine learning, computer vision, and natural language processing is crucial, alongside stable training techniques.
  • Challenges: Includes ethical concerns like bias prevention and high computational costs, with diffusion models offering stability over older GANs.

This process is complex, but understanding it highlights the innovation behind these tools, opening doors for future advancements.

Exploring Different AI Image Generation Models​


AI image generation has revolutionized creative industries, enabling the production of photorealistic and artistic images from textual prompts. Tools like DALL-E, Imagen, Aurora, and Midjourney have become household names, integrated into platforms like ChatGPT, Google Gemini, Grok, and Midjourney. This section delves into the technologies behind these models and the intricate process of building them from scratch, catering to both technical and non-technical audiences.



Popular AI Image Generators​


Several prominent AI image generators have emerged, each with distinct technological underpinnings:

  • DALL-E (OpenAI): Likely the backbone of ChatGPT's image generation, especially versions like ChatGPT 4o, DALL-E uses diffusion models. The research paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" (Hierarchical Text-Conditional Image Generation with CLIP Latents) details DALL-E 2's architecture, which involves a prior generating CLIP image embeddings from text and a decoder using diffusion to create images. This model, with 3.5 billion parameters, enhances realism and resolution, integrated into ChatGPT for seamless user interaction.
  • Google Gemini (Imagen): Google Gemini leverages Imagen 3 for image generation, as noted in recent updates (Google Gemini updates: Custom Gems and improved image generation with Imagen 3). Imagen uses diffusion models, with the research paper "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" (Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding) describing its architecture. It employs a large frozen T5-XXL encoder for text and conditional diffusion models for image generation, achieving a COCO FID of 7.27, indicating high image-text alignment.
  • Grok (Aurora by xAI): Grok, developed by xAI, uses Aurora for image generation, as announced in the blog post "Grok Image Generation Release" (Grok Image Generation Release). Unlike others, Aurora is an autoregressive mixture-of-experts network, trained on interleaved text and image data to predict the next token, offering photorealistic rendering and multimodal input support. This approach, detailed in the post, contrasts with diffusion models, focusing on sequential prediction.
  • Midjourney: Midjourney, a generative AI program, uses diffusion models, as inferred from comparisons with Stable Diffusion and DALL-E (Midjourney - Wikipedia). While proprietary, industry analyses suggest it leverages diffusion for real-time image generation, known for artistic outputs and accessed via Discord or its website, entering open beta in July 2022.

These tools illustrate the diversity in approaches, with diffusion models dominating due to their quality, except for Grok's unique autoregressive method.

Breakdown of Technologies Behind AI Image Generation Models​


The core technologies driving these models include diffusion models, autoregressive models, and historical approaches like GANs and VAEs. Here's a deeper dive:



Diffusion Models: The State-of-the-Art.​


Diffusion models, as used in DALL-E, Imagen, and Midjourney, operate through a two-stage process:​


  • Forward Process: Gradually adds noise to an image over many steps, creating a sequence from a clear image to pure noise. This is akin to sculpting, where noise is like chiseling away marble to reveal the form.
  • Reverse Process: Trains a neural network, often a U-Net, to predict and remove noise at each step, starting from noise to generate a coherent image. For text-to-image, text embeddings guide this process, ensuring the image aligns with the prompt.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186
The architecture, as seen in Imagen, involves a text encoder (e.g., T5-XXL) and conditional diffusion models, with upsampling stages (64×64 to 1024×1024) using super-resolution diffusion models. DALL-E 2's decoder modifies Nichol et al.'s (2021) diffusion model, adding CLIP embeddings for guidance, with training details in Table 3 from the paper:

ModelDiffusion StepsNoise ScheduleSampling StepsSampling Variance MethodModel SizeChannelsDepthChannels MultipleHeads ChannelsAttention ResolutionText Encoder ContextText Encoder WidthText Encoder DepthText Encoder HeadsLatent Decoder ContextLatent Decoder WidthLatent Decoder DepthLatent Decoder HeadsDropoutWeight DecayBatch SizeIterationsLearning RateAdam β2\beta_2β2Adam ϵ\epsilonϵEMA Decay
AR prior----1B-----2562048243238416642426-4.0e-240961M1.6e-40.911.0e-100.999
Diffusion prior1000cosine64analytic [2]1B-----25620482432-----6.0e-24096600K1.1e-40.961.0e-60.9999
64→256 Upsampler1000cosine27DDIM [47]700M32031,2,3,4----------0.1-10241M1.2e-40.9991.0e-80.9999
256→1024 Upsampler1000linear15DDIM [47]300M19221,1,2,2,4,4------------5121M1.0e-40.9991.0e-80.9999

This table highlights hyperparameters, showing the computational intensity, with batch sizes up to 4096 and iterations in the millions.



Autoregressive Models: Sequential Prediction​


Grok's Aurora uses an autoregressive approach, predicting image tokens sequentially, akin to writing a story word by word. The xAI blog post describes it as a mixture-of-experts network, trained on billions of internet examples, excelling in photorealistic rendering. This method, detailed in the release, contrasts with diffusion by generating images part by part, potentially slower but offering unique capabilities like editing user-provided images.



Historical Approaches: GANs and VAEs​


GANs, with a generator and discriminator competing, and VAEs, encoding images into latent spaces for decoding, were early methods. However, diffusion models, as noted in Imagen's research, outperform them in fidelity and diversity, making them less common in current state-of-the-art systems.

How to Build an AI Image Generator from Scratch?​


Constructing an AI image generator from scratch is a monumental task, requiring:

  1. Data Requirements:
  2. Computational Resources:
    • Training demands powerful GPUs or TPUs, with costs in thousands of GPU hours, reflecting the scale seen in DALL-E and Imagen. Infrastructure for distributed training, as implied in the papers, is crucial for handling large-scale data.
  3. Model Architecture:
    • For diffusion models, implement U-Net architectures, as in Imagen, with text conditioning via large language models. For autoregressive, use transformers, as in Aurora, handling sequential token prediction. The choice depends on desired output quality and speed.
  4. Training Process:
    • Data Preprocessing: Clean datasets, tokenize text, and resize images for uniformity, ensuring compatibility with model inputs.
    • Model Initialization: Leverage pre-trained models, like T5 for text encoding, to reduce training time, as seen in Imagen.
    • Optimization: Use advanced techniques, with learning rates and batch sizes from Table 3, ensuring stable convergence, especially for diffusion models.
  5. Challenges and Considerations:
    • Training Stability: Diffusion models, while stable, require careful tuning, unlike GANs prone to mode collapse. Ethical concerns, as noted in DALL-E's safety mitigations (DALL·E 2), include filtering harmful content and monitoring bias.
    • Compute Costs: High energy and hardware costs, with environmental impacts, are significant, necessitating efficient architectures like Imagen's Efficient U-Net.
    • Expertise Needed: Requires deep knowledge in machine learning, computer vision, and natural language processing, with skills in handling large-scale training pipelines.

This process, while feasible with resources, underscores the complexity, with open-source alternatives like Stable Diffusion offering starting points for enthusiasts.

Conclusion​


AI image generation, dominated by diffusion models, with Grok's autoregressive approach adding diversity, showcases technological innovation. Building from scratch demands significant data, compute, and expertise, highlighting the barriers to entry. As research progresses, expect advancements in efficiency, ethics, and multimodal capabilities, further blurring human-machine creative boundaries.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,971
Reputation
9,348
Daps
170,186












1/13
@buildthatidea
Anthropic recently dropped fascinating research on how models like Claude actually think and work.

It's one of the most important research papers of 2025

Here are my 7 favorite insights 🧵



Gnhx2ijW4AEsAYf.jpg


2/13
@buildthatidea
1/ Claude plans ahead when writing poetry!

It identifies potential rhyming words before writing a line, then constructs sentences to reach those words.

It's not just predicting one word at a time.



GnhxXVuXMAAWLV7.jpg


3/13
@buildthatidea
2/ Claude has a universal language of thought

When processing questions in English, French, or Chinese, it activates the same internal features for concepts

It then translates these concepts into the specific language requested.



Gnhxa15W0AA2EWd.jpg


4/13
@buildthatidea
3/ Claude does mental math using parallel computational paths

One path is for approximation, another for precise digit calculation.

But when asked how it solved the problem, it describes using standard algorithms humans use



GnhxR4YWwAArscU.jpg


5/13
@buildthatidea
4/ Claude can “fake” its reasoning

When solving a math problem, it might give a full chain of thought that sounds correct

But inside, it never actually did the math

It just made up the steps to sound helpful



GnhxO95WsAEoYu2.jpg


6/13
@buildthatidea
5/ It performs multi-step real reasoning

When solving complex questions like "What's the capital where Dallas is located?", Claude actually follows distinct reasoning steps: first activating "Dallas is in Texas" and then "capital of Texas is Austin."

It's not just memorization



GnhxIfBXQAA7uQ9.jpg


7/13
@buildthatidea
6/ Claude defaults to not answering when unsure

Claude’s instinct is to say “I don’t know.”

For known entities (like Michael Jordan), a "known entity" feature inhibits this refusal. But sometimes this feature misfires, causing hallucinations.



GnhxDqmXkAAyfli.png


8/13
@buildthatidea
7/ Jailbreaks work by exploiting conflicts inside Claude

In one example Claude was tricked into writing something dangerous like BOMB

It continued only because it wanted to finish a grammatically correct sentence

Once that was done it reverted to safety and refused to continue



Gnhw09VXgAAcJus.jpg


9/13
@buildthatidea
8/ Why this matters

- Better understanding of AI leads to better safety
- We can catch fake logic
- Prevent hallucinations
- Understand when and how reasoning happens
- And make more trustworthy systems



10/13
@buildthatidea
If you’re building, researching, or just curious about how language models work, this is a must-read.

Read it here: https://www.anthropic.com/research/tracing-thoughts-language-model



11/13
@buildthatidea
Want to build AI apps?

With BuildThatIdea - Launch GPT Wrappers in 60 Seconds!, you can build and monetize AI apps in 60 seconds

Sign up here:



12/13
@buildthatidea
That's a wrap ✨ Hope you enjoyed it

If you found this thread valuable:

1. Follow @0xmetaschool for more
2. Retweet the first tweet so more people can see it

[Quoted tweet]
Anthropic recently dropped fascinating research on how models like Claude actually think and work.

It's one of the most important research papers of 2025

Here are my 7 favorite insights 🧵


Gnhx2ijW4AEsAYf.jpg


13/13
@KatsDojo
Gonna deff be sharing this with the team ty always! 🫂




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top