The A.I Megathread (LLM , GPT , Development)

bnew · Oct 17, 2024

1/11
@gordic_aleksa
How does OpenAI's o1 exactly work? Here is a list of papers & summaries on LLM reasoning that I've recently read.

I'll split them into 2 categories:

1) prompt-based - enforce step by step reasoning & self-correcting flow purely using prompts

2) learning-based - bake in the above into the policy model's weights (or into a verifier - usually a PRM; process reward model)

--- Prompt-based

0) CoT et al: https://arxiv.org/pdf/2201.11903

Ground zero (plus I'm a C guy :P iykyk). It started with Chain-of-Thought paper. This class of methods boils down to asking the LLM nicely to reveal its internal thoughts (e.g. "Let's think step by step"; more generally telling the model to disclose the intermediate computation steps in someway).

A simple variation on CoT would be "CoT self-consistency" -> i.e. sample multiple CoT traces in parallel and use majority voting to find the "right" answer.

1) ToT (tree of thought): https://arxiv.org/pdf/2305.10601

Further complexifies the above (in CS terms we go from linear list -> tree): build up an m-ary tree of intermediate thoughts (thoughts = intermediate steps of CoT); at each thought/node:

a) run the “propose next thoughts” prompt (or just sample completions m times)
b) evaluate those thoughts (either independently or jointly)
c) keep the top m

cons: very expensive & slow
pro: works with off-the-shelf LLMs

2) Self-Reflection: https://arxiv.org/pdf/2405.06682

If incorrect response pass the self-reflection feedback back to an LLM before attempting to re-answer again;

As an input self-reflection gets a gold answer and is prompted to explain how it would now solve the problem. The results are redacted before passing back the feedback to avoid leaking the solution.

Even re-answering given only a binary feedback (“your previous answer was incorrect”) is significantly stronger than the baseline (no feedback, just sample a response once).

3) Self-Contrast: https://arxiv.org/pdf/2401.02009

a) create multiple solutions by evaluating diverse prompts derived from the original question (yields diverse perspectives on how to solve the problem
b) pairwise contrast the solutions
c) generate a todo checklist in order to revise the generations from a)

4) Think Before You Speak: https://arxiv.org/pdf/2311.07445

Introduces the CSIM method: 5 prompts that make the model better at dialogue applications.

5 prompts they use to help improve the communication skills are "empathy", "topic transition", "proactively asking questions", "concept guidance", "summarizing often".

Their LLM has 2 roles: thinking and speaking. The thinking role, or the “inner monologue” is occasionally triggered by the 5 prompts and is not displayed to the user but is instead used as the input for the user-facing speaking role.

I think these 5 bullet points nicely captures the main modes of prompt-based methods i've observed -> lmk if I missed an important one!

--- learning-based -> coming up in the follow-up post

2/11
@gordic_aleksa
One more here (I guess I should have classified this bucket as not a learning-based instead of prompt-based...anyhow :smile:

):

Chain-of-Thought Reasoning w/o prompting: https://arxiv.org/pdf/2402.10200

Introduces CoT-decoding method showing that pre-trained LMs can reason w/o excplicit CoT prompting.

The idea is simple: sample top-k paths and compute the average difference between the probability of the 1st and 2nd most likely token along the answer’s token sequence. Take the path that maximizes this metric - that is likely the CoT path.

Has a good argument against the CoT self-consistency method: most of the solutions they sample are incorrect, only using their heuristic they pick the implicit CoT traces.

3/11
@Ki_Seki_here
There are more options here.

[Quoted tweet]
Here, I'd like to share one line: reasoning topologically. @OpenAI o1 uses one such technique, Quiet-STaR. Beyond this, there are other related techniques: optimizing the input , introducing more topological relationships, applying inference in decoding phase...

10/11

4/11
@bate5a55
Fascinating breakdown! Interestingly, OpenAI's o1 reportedly employs a dynamic Tree of Thoughts during inference, allowing it to adaptively optimize reasoning paths in real-time—almost like crafting its own cognitive map on the fly.

5/11
@sabeerawa05
Superb!

6/11
@CoolMFcat
Youtube video when

7/11
@Mikuk84
Thanks so much for the paper list!

8/11
@marinac_dev
What about swarm they released recently?
IMO I think o1 is swarm principle just on steroids.

9/11
@attention_gan
Does that mean it is not actually thinking but trying to use a combination of steps seen in the data

10/11
@joao_monoirelia
.

11/11
@MarkoVelich
Nice overview. Recently I did a talk on this topic. Put a bunch of reference throughout: Intuition Behind Reasoning in LLMs
My view on all these post-training/prompting techniques is that it is basically a filtering of the pre-train corpus. Ways to squeeze out good traces from the model.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/9
@gordic_aleksa
How does OpenAI's o1 exactly work? Part 2. Here is a list of papers & summaries on LLM reasoning that I've recently read. All learning-based.

0) STaR: Self-Taught Reasoner https://arxiv.org/pdf/2203.14465

Ground Zero. Instead of always having to CoT prompt, bake that into the default model behavior. Given a dataset of (question, answer) pairs, manually curate a few CoT traces and use them as fewshot examples to generate (rationale, answer) tuples, given a question, for the rest of the dataset ("bootstrap reasoning"). SFT on (question, rationale, answer) triples. Do multiple iterations of this i.e. collect data retrain and keep sampling from that new policy (in practice I've observed people use ~3 iterations). Con: can only learn from correct triples.

1) V-STaR: https://arxiv.org/pdf/2402.06457

A variation on STaR. Collect both correct/incorrect samples. Incorrect ones can be used to train a verifier LLM via DPO. During inference additionally use the verifier to rank the generations.

---
[Recommended reading for 2) below]
Let’s Verify Step by Step: https://arxiv.org/pdf/2305.20050

They train a Process Supervision Reward Model PRM and show its advantage over Outcome Supervision RM - ORM (i.e. don't just pass a whole generation and ask for how good it is, pass in individual CoT elements instead; finer resolution).

Data collection: Humans were employed to annotate each line of step-by-step solutions (math problems) - expensive!! They've been given a curated list of samples that were automatically selected as “convincing wrong-answer” solutions - a form of hard sample mining.
---

2) Improve Mathematical Reasoning in LMs: https://arxiv.org/pdf/2406.06592

Proposes OmegaPRM - a process RM trained on data collected using MCTS (AlphaGo vibes? yes, paper comes from DeepMind). As a policy for collecting PRM data they used a SFT-ed Gemini Pro (instruct samples distilled from Gemini Ultra). Gemini Pro combined with OmegaPRM-weighted majority voting yielded nice improvements on math benchmarks. Suitable only when there is a golden answer, doesn’t work for open-ended tasks.

3) Learn Beyond The Answer: https://arxiv.org/pdf/2406.12050

Instead of increasing the number of samples in SFT dataset (STaR-like) they augment/extend the samples by appending a reflection to the existing (q,a) tuple. Reflection = alternative reasoning and follow-ups like analogy & abstraction). Complementary to STaR.

4) Quiet-STaR: https://arxiv.org/pdf/2403.09629

Uses RL (reinforce method) and picks rationales, which branch off of a thought, that increase the likelihood of the correct answer and subsequently trains on them. Adds new start/end of thinking tokens, and an MLP for mixing the rationale stream with the default stream. Interesting ideas, I feel that due to its complexity it won't survive Bitter Lesson's filter.

5) Scaling LLM Test-Time Compute: https://arxiv.org/pdf/2408.03314

They first estimate the difficulty of a user query placing it into one out of 5 difficulty bins (i think they need 2048 samples for this!). Then depending on the query’s difficulty they deploy various techniques to estimate the optimal results.

They experimented with 2 categories of methods: search requires a PRM verifier) & revisions (afaik they only did intra-comparisons, i.e. they didn’t compare search vs revision).

For search they experimented with best-of-N-weighted, beam, and lookahead.

For easier Qs best-of-N is the best method, later it’s beam (lookahead never pays off). For revision they first do SFT on samples that consist out of 0-4 incorrect intermediate steps followed by the correct solution (to teach it to self-correct). Subsequently they test applying it sequentially vs in parallel. There exists an “optimal” ratio of sequential-to-parallel depending on the difficulty bin.

6) Agent Q: https://arxiv.org/pdf/2408.07199

Really the main idea is to use MCTS to do test-time search (according to the perf they get not according to paper authors).

The second core idea is to leverage successful & unsuccessful trajectories collected using a “guided MCTS” (guided by an LLM judge) to improve the base policy via DPO, this gives them the “Agent Q”. Agent Q + test-time MCTS yields the best results.

Note: they operate in a very specific environment - web and focus on a narrow task: booking a table.

7) Training LMs to Self-Correct via RL: https://arxiv.org/pdf/2409.12917

They introduce SCoRe, a multi-turn (multi=2) RL method. They show that STaR methods fail to improve in a multi-turn setup (due to behavior collapse as measured by the edit distance histogram; i.e. STaR models are reluctant to deviate significantly from their 1st solution).

Training proceeds in 2 stages:

1) train the base model to produce high reward responses at the 2nd attempt while forcing the model not to change its 1st attempt (via KL div)

2) jointly maximize reward of both attempts; in order to prevent a collapse to a non-self-correcting behavior they do reward shaping by adding an additional penalty that rewards the model if it achieves higher reward on the 2nd attempt and heavily penalizes the transitions from correct->incorrect.

---

That's a wrap for now, so how does o1 work?

No idea, but likely involves above ideas + a shyt ton of compute & scale (data, params).

Let me know if I missed an important paper that's not an obvious interpolation of the above ideas (unless it's seminal despite that? :smile:

).

2/9
@gordic_aleksa
thanks @_philschmid for aggregating the list LLM Reasoning Papers - a philschmid Collection

3/9
@JUQ_AI
Are you impressed with o1?

4/9
@barrettlattimer
Thanks for making this list! Another is
Sparse Rewards Can Self-Train Dialogue Agents: https://arxiv.org/pdf/2409.04617

Introduces JOSH, an algorithm that uses sparse rewards to extract the ideal behavior of a model in multi turn dialogue and improves quality for small & frontier models

5/9
@neurosp1ke
I would be very interested in creating a “must-read-essentials” short-list of the GitHub - open-thought/system-2-research: System 2 Reasoning Link Collection list. I will take your list as a first candidate.

6/9
@axel_pond
top quality content, king

7/9
@synthical_ai
Dark mode for this paper for those who read at night

STaR: Bootstrapping Reasoning With Reasoning

8/9
@MapaloChansa1
Interesting

9/9
@eatnow240008
[2310.04363] Amortizing intractable inference in large language models
Perhaps this?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 19, 2024

1/11
@akshay_pachaar

Microsoft just changed the game!

They've open-sourced bitnet.cpp: a blazing-fast 1-bit LLM inference framework that runs directly on CPUs.

Why is this a game-changer

You can now run 100B parameter models on local devices with up to 6x speed improvements and 82% less energy consumption—all without a GPU!

The future we've been waiting for: fast, efficient, and private AI that works anytime, anywhere.

Link to the GitHub repo in next tweet!
_____

Find me → @akshay_pachaar

For more insights and tutorials on AI and Machine Learning!

https://video.twimg.com/ext_tw_video/1847312397738143745/pu/vid/avc1/1058x720/pqz1GvjKHQYwx2Yw.mp4

2/11
@akshay_pachaar

Here's the official GitHub repo: GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs
------

Interested in ML/AI Engineering? Sign up for our newsletter for in-depth lessons and get a FREE pdf with 150+ core DS/ML lessons: Daily Dose of Data Science Newsletter

3/11
@MarkPommrehn

Cool! Thanks! That's a wonderful leap in accessibility and cost-effective usability!

4/11
@akshay_pachaar

Absolutely!

5/11
@stephen_rayner

Limitations?

6/11
@akshay_pachaar

The quality needs to improve a lot, but it's a good start

7/11
@bchewyme

Thanks for sharing!

8/11
@akshay_pachaar

You’re welcome!

9/11
@joaomendoncaaaa

It's literally hallucinating like hell on your example. Look at the stdout, it's all infinitely repeated sentences.

1bit quant is great, but let's breathe for a second lol

10/11
@c___f___b

Nice. With such progress they will move from 1-bit to 1-trit architecture next year to get rid of hallucinations.

11/11
@ATeRSa_NDUcC

Your video shows it's literally braindead. What use is a 100B model that has been quantized to lobotomy levels?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Supported Models

bitnet.cpp supports a list of 1-bit models available on Hugging Face, which are trained with research settings. We hope the release of bitnet.cpp can inspire more 1-bit LLMs trained in large-scale settings.

I2_S	TL1	TL2
Model	Parameters	CPU	Kernel
bitnet_b1_58-large	0.7B	x86	✔	✘	✔
ARM	✔	✔	✘
bitnet_b1_58-3B	3.3B	x86	✘	✘	✔
ARM	✘	✔	✘
Llama3-8B-1.58-100B-tokens	8.0B	x86	✔	✘	✔
ARM	✔	✔	✘

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Published September 18, 2024
Update on GitHub

medmekk Mohamed Mekkouri
marcsun13 Marc Sun
lvwerra Leandro von Werra
pcuenq Pedro Cuenca
osanseviero Omar Sanseviero
thomwolf Thomas Wolf

As Large Language Models (LLMs) grow in size and complexity, finding ways to reduce their computational and energy costs has become a critical challenge. One popular solution is quantization, where the precision of parameters is reduced from the standard 16-bit floating-point (FP16) or 32-bit floating-point (FP32) to lower-bit formats like 8-bit or 4-bit. While this approach significantly cuts down on memory usage and speeds up computation, it often comes at the expense of accuracy. Reducing the precision too much can cause models to lose crucial information, resulting in degraded performance.

BitNet is a special transformers architecture that represents each parameter with only three values: (-1, 0, 1), offering a extreme quantization of just 1.58 ( log2(3)log2(3) ) bits per parameter. However, it requires to train a model from scratch. While the results are impressive, not everybody has the budget to pre-train an LLM. To overcome this limitation, we explored a few tricks that allow fine-tuning an existing model to 1.58 bits! Keep reading to find out how !

bnew · Oct 19, 2024

1/1
Ok, this could be a bit of a game-changer in the local AI world. The paper on 1-bit (1.585 bits actually) came out earlier this year. This means a 70B model could be quantise down to run in 16GB. Microsoft just published BitNet.cpp.

GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 19, 2024

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results

Nvidia quietly launched a groundbreaking AI model that surpasses OpenAI’s GPT-4 and Anthropic’s Claude 3.5, signaling a major shift in the competitive landscape of artificial intelligence.

venturebeat.com

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results

Michael Nuñez@MichaelFNunez

October 16, 2024 6:45 PM

Credit: VentureBeat made with Midjourney

Nvidia quietly unveiled a new artificial intelligence model on Tuesday that outperforms offerings from industry leaders OpenAI and Anthropic, marking a significant shift in the company’s AI strategy and potentially reshaping the competitive landscape of the field.

The model, named Llama-3.1-Nemotron-70B-Instruct, appeared on the popular AI platform Hugging Face without fanfare, quickly drawing attention for its exceptional performance across multiple benchmark tests.

Nvidia reports that their new offering achieves top scores in key evaluations, including 85.0 on the Arena Hard benchmark, 57.6 on AlpacaEval 2 LC, and 8.98 on the GPT-4-Turbo MT-Bench.

These scores surpass those of highly regarded models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, catapulting Nvidia to the forefront of AI language understanding and generation.

Nvidia’s AI gambit: From GPU powerhouse to language model pioneer

This release represents a pivotal moment for Nvidia. Known primarily as the dominant force in graphics processing units (GPUs) that power AI systems, the company now demonstrates its capability to develop sophisticated AI software. This move signals a strategic expansion that could alter the dynamics of the AI industry, challenging the traditional dominance of software-focused companies in large language model development.

Nvidia’s approach to creating Llama-3.1-Nemotron-70B-Instruct involved refining Meta’s open-source Llama 3.1 model using advanced training techniques, including Reinforcement Learning from Human Feedback (RLHF). This method allows the AI to learn from human preferences, potentially leading to more natural and contextually appropriate responses.

With its superior performance, the model has the potential to offer businesses a more capable and cost-efficient alternative to some of the most advanced models on the market.

The model’s ability to handle complex queries without additional prompting or specialized tokens is what sets it apart. In a demonstration, it correctly answered the question “How many r’s are in strawberry?” with a detailed and accurate response, showcasing a nuanced understanding of language and an ability to provide clear explanations.

What makes these results particularly significant is the emphasis on “alignment,” a term in AI research that refers to how well a model’s output matches the needs and preferences of its users. For enterprises, this translates into fewer errors, more helpful responses, and ultimately, better customer satisfaction.

How Nvidia’s new model could reshape business and research

For businesses and organizations exploring AI solutions, Nvidia’s model presents a compelling new option. The company offers free hosted inference through its build.nvidia.com platform, complete with an OpenAI-compatible API interface.

This accessibility makes advanced AI technology more readily available, allowing a broader range of companies to experiment with and implement advanced language models.

The release also highlights a growing shift in the AI landscape toward models that are not only powerful but also customizable. Enterprises today need AI that can be tailored to their specific needs, whether that’s handling customer service inquiries or generating complex reports. Nvidia’s model offers that flexibility, along with top-tier performance, making it a compelling option for businesses across industries.

However, with this power comes responsibility. Like any AI system, Llama-3.1-Nemotron-70B-Instruct is not immune to risks. Nvidia has cautioned that the model has not been tuned for specialized domains like math or legal reasoning, where accuracy is critical. Enterprises will need to ensure they are using the model appropriately and implementing safeguards to prevent errors or misuse.

The AI arms race heats up: Nvidia’s bold move challenges tech giants

Nvidia’s latest model release signals just how fast the AI landscape is shifting. While the long-term impact of Llama-3.1-Nemotron-70B-Instruct remains uncertain, its release marks a clear inflection point in the competition to build the most advanced AI systems.

By moving from hardware into high-performance AI software, Nvidia is forcing other players to reconsider their strategies and accelerate their own R&D. This comes on the heels of the company’s introduction of the NVLM 1.0 family of multimodal models, including the 72-billion-parameter NVLM-D-72B.

These recent releases, particularly the open-source NVLM project, have shown that Nvidia’s AI ambitions go beyond just competing—they are challenging the dominance of proprietary systems like GPT-4o in areas ranging from image interpretation to solving complex problems.

The rapid succession of these releases underscores Nvidia’s ambitious push into AI software development. By offering both multimodal and text-only models that compete with industry leaders, Nvidia is positioning itself as a comprehensive AI solutions provider, leveraging its hardware expertise to create powerful, accessible software tools.

Nvidia’s strategy seems clear: it’s positioning itself as a full-service AI provider, combining its hardware expertise with accessible, high-performance software. This move could reshape the industry, pushing rivals to innovate faster and potentially sparking more open-source collaboration across the field.

As developers test Llama-3.1-Nemotron-70B-Instruct, we’re likely to see new applications emerge across sectors like healthcare, finance, education, and beyond. Its success will ultimately depend on whether it can turn impressive benchmark scores into real-world solutions.

In the coming months, the AI community will closely watch how Llama-3.1-Nemotron-70B-Instruct performs in real-world applications beyond benchmark tests. Its ability to translate high scores into practical, valuable solutions will ultimately determine its long-term impact on the industry and society at large.

Nvidia’s deeper dive into AI model development has intensified the competition. If this is the beginning of a new era in artificial intelligence, it’s one where fully integrated solutions may set the pace for future breakthroughs.

bnew · Oct 19, 2024

Archetype AI’s Newton model learns physics from raw data—without any help from humans

Archetype AI's 'Newton' model learns physics from sensor data, promising breakthroughs in industrial applications and scientific discovery.

venturebeat.com

Archetype AI’s Newton model learns physics from raw data—without any help from humans

Michael Nuñez@MichaelFNunez

October 17, 2024 9:00 AM

nuneybits_Vector_art_of_Newtons_famous_prism_experiment_b99ac895-a0d3-429c-910d-6f1aeff49b97.webp

Researchers at Archetype AI have developed a foundational AI model capable of learning complex physics principles directly from sensor data, without any pre-programmed knowledge. This breakthrough could significantly change how we understand and interact with the physical world.

The model, named Newton, demonstrates an unprecedented ability to generalize across diverse physical phenomena, from mechanical oscillations to thermodynamics, using only raw sensor measurements as input. This achievement, detailed in a paper released today, represents a major advance in artificial intelligence’s capacity to interpret and predict real-world physical processes.

“We’re asking if AI can discover the laws of physics on its own, the same way humans did through careful observation and measurement,” said Ivan Poupyrev, co-founder of Archetype AI, in an exclusive interview with VentureBeat. “Can we build a single AI model that generalizes across diverse physical phenomena, domains, applications, and sensing apparatuses?”

From pendulums to power grids: AI’s uncanny predictive powers

Trained on over half a billion data points from diverse sensor measurements, Newton has shown remarkable versatility. In one striking demonstration, it accurately predicted the chaotic motion of a pendulum in real-time, despite never being trained on pendulum dynamics.

The model’s capabilities extend to complex real-world scenarios as well. Newton outperformed specialized AI systems in forecasting citywide power consumption patterns and predicting temperature fluctuations in power grid transformers.

“What’s remarkable is that Newton had not been specifically trained to understand these experiments — it was encountering them for the first time and was still able to predict outcomes even for chaotic and complex behaviors,” Poupyrev told VentureBeat.

Screenshot-2024-10-17-at-1.01.47%E2%80%AFAM.png

Performance comparison of Archetype AI’s ‘Newton’ model across various complex physical processes. The graph shows that the model, even without specific training (zero-shot), often outperforms or matches models trained specifically for each task, highlighting its potential for broad applicability. (Credit: Archetype AI)

Adapting AI for industrial applications

Newton’s ability to generalize to entirely new domains could significantly change how AI is deployed in industrial and scientific applications. Rather than requiring custom models and extensive datasets for each new use case, a single pre-trained foundation model like Newton might be adapted to diverse sensing tasks with minimal additional training.

This approach represents a significant shift in how AI can be applied to physical systems. Currently, most industrial AI applications require extensive custom development and data collection for each specific use case. This process is time-consuming, expensive, and often results in models that are narrowly focused and unable to adapt to changing conditions.

Newton’s approach, by contrast, offers the potential for more flexible and adaptable AI systems. By learning general principles of physics from a wide range of sensor data, the model can potentially be applied to new situations with minimal additional training. This could dramatically reduce the time and cost of deploying AI in industrial settings, while also improving the ability of these systems to handle unexpected situations or changing conditions.

Moreover, this approach could be particularly valuable in situations where data is scarce or difficult to collect. Many industrial processes involve rare events or unique conditions that are challenging to model with traditional AI approaches. A system like Newton, which can generalize from a broad base of physical knowledge, might be able to make accurate predictions even in these challenging scenarios.

Expanding human perception: AI as a new sense

The implications of Newton extend beyond industrial applications. By learning to interpret unfamiliar sensor data, AI systems like Newton could expand human perceptual capabilities in new ways.

“We have sensors now that can detect aspects of the world humans can’t naturally perceive,” Poupyrev told VentureBeat. “Now we can start seeing the world through sensory modalities which humans don’t have. We can enhance our perception in unprecedented ways.”

This capability could have profound implications across a range of fields. In medicine, for example, AI models could help interpret complex diagnostic data, potentially identifying patterns or anomalies that human doctors might miss. In environmental science, these models could help analyze vast amounts of sensor data to better understand and predict climate patterns or ecological changes.

The technology also raises intriguing possibilities for human-computer interaction. As AI systems become better at interpreting diverse types of sensor data, we might see new interfaces that allow humans to “sense” aspects of the world that were previously imperceptible. This could lead to new tools for everything from scientific research to artistic expression.

Archetype AI, a Palo Alto-based startup founded by former Google researchers, has raised $13 million in venture funding to date. The company is in discussions with potential customers about real-world deployments, focusing on areas such as predictive maintenance for industrial equipment, energy demand forecasting, and traffic management systems.

The approach also shows promise for accelerating scientific research by uncovering hidden patterns in experimental data. “Can we discover new physical laws?” Poupyrev mused. “It’s an exciting possibility.”

“Our main goal at Archetype AI is to make sense of the physical world,” Poupyrev told VentureBeat. “To figure out what the physical world means.”

As AI systems become increasingly adept at interpreting the patterns underlying physical reality, that goal may be within reach. The research opens new possibilities – from more efficient industrial processes to scientific breakthroughs and novel human-computer interfaces that expand our understanding of the physical world.

For now, Newton remains a research prototype. But if Archetype AI can successfully bring the technology to market, it could usher in a new era of AI-powered insight into the physical world around us.

The challenge now will be to move from promising research results to practical, reliable systems that can be deployed in real-world settings. This will require not only further technical development, but also careful consideration of issues like data privacy, system reliability, and the ethical implications of AI systems that can interpret and predict physical phenomena in ways that might surpass human capabilities.

bnew · Oct 19, 2024

OpenAI just launched ChatGPT for Windows—and it’s coming for your office software

OpenAI launches ChatGPT desktop app for Windows, expanding its AI assistant's reach and signaling a strategic push into enterprise productivity tools.

venturebeat.com

OpenAI just launched ChatGPT for Windows—and it’s coming for your office software

Michael Nuñez@MichaelFNunez

October 17, 2024 11:14 AM

Credit: VentureBeat made with Midjourney

OpenAI, the artificial intelligence powerhouse behind ChatGPT, has taken another step in its quest for ubiquity by releasing a Windows desktop application for its popular AI chatbot. The move, announced Thursday, follows the earlier launch of a macOS client and marks a significant push by OpenAI to embed its technology more deeply into users’ daily workflows.

The new Windows app, currently available in preview to ChatGPT Plus, Enterprise, Team, and Edu subscribers, allows users to access the AI assistant via a keyboard shortcut (Alt + Space) from anywhere on their PC. This seamless integration aims to boost productivity by making AI assistance readily available without the need to switch to a web browser.

OpenAI’s new ChatGPT desktop application for Windows, showing a user interface with conversation history. (Credit: OpenAI)

OpenAI’s desktop strategy: More than just convenience

OpenAI’s strategy of platform expansion goes beyond mere convenience. By creating native applications for major operating systems, the company is positioning ChatGPT as an indispensable tool in both personal and professional environments. This move serves multiple purposes: it increases user engagement, facilitates more extensive data collection for model improvement, and creates a sticky ecosystem that could be challenging for competitors to displace.

The desktop app approach also reveals OpenAI’s ambition to become the de facto AI assistant for knowledge workers. By integrating ChatGPT more deeply into users’ workflows, OpenAI is not just improving accessibility but potentially reshaping how people interact with computers and process information.

Enterprise ambitions: ChatGPT as the new office suite?

The Windows release comes at a critical juncture for OpenAI, as the company faces increasing competition in the AI space and scrutiny over its rapid growth and influential position. Recent reports suggest that OpenAI is exploring partnerships beyond its well-known Microsoft alliance, including discussions with Oracle for AI data center infrastructure and pitches to the U.S. military and national security establishment.

OpenAI’s aggressive expansion into desktop environments signals a potential shift in the enterprise software landscape. The company appears to be positioning ChatGPT as a fundamental productivity tool for businesses, potentially disrupting traditional enterprise software providers. This move, coupled with the recent partnership expansion with Bain & Company to sell ChatGPT to businesses, suggests OpenAI is not content with being merely an AI research lab but is actively pursuing a dominant position in the commercial AI sector.

The implications of this strategy are huge. If successful, ChatGPT could become the new “operating system” for knowledge work, fundamentally changing how businesses operate and potentially displacing or absorbing functions currently served by separate software suites.

Balancing Act: Innovation, ethics, and commercialization

However, OpenAI’s rapid growth and increasing influence have not been without controversy. The company’s AI models have faced scrutiny over potential biases and the societal implications of widespread AI deployment. Additionally, OpenAI’s dual status as a capped-profit company with significant commercial interests has raised questions about its governance and long-term objectives.

As OpenAI continues to expand its reach, the company faces a delicate balancing act. It must navigate the tensions between its stated mission of ensuring artificial general intelligence benefits humanity and its increasingly commercial focus. The Windows app release, while a seemingly straightforward product expansion, represents another step in OpenAI’s complex journey of shaping the future of AI in both consumer and enterprise contexts.

The success of this desktop strategy could cement OpenAI’s position as the leading AI company, but it also increases the urgency of addressing ethical concerns and potential monopolistic practices. As ChatGPT becomes more deeply integrated into daily work and life, the stakes for getting AI right — in terms of safety, fairness, and societal impact — have never been higher.

bnew · Oct 19, 2024

1/3
@ai_for_success
Newton AI : Researchers at Archetype AI have developed a foundational AI model ( Newton AI) capable of learning complex physics principles directly from sensor data, without any pre-programmed knowledge.

It can accurately predict behaviors of systems it wasn't explicitly trained on, such as pendulum motion.

2/3
@ai_for_success
Newton AI @PhysicalAI

https://video.twimg.com/amplify_video/1847003664487202816/vid/avc1/1280x720/WAPRMQ_3-lzdsJqg.mp4

3/3
@ImMr_Wise
How do we replicate the dataset is all I'm thinking

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/3
@ai_for_success
Newton AI : Researchers at Archetype AI have developed a foundational AI model ( Newton AI) capable of learning complex physics principles directly from sensor data, without any pre-programmed knowledge.

It can accurately predict behaviors of systems it wasn't explicitly trained on, such as pendulum motion.

2/3
@ai_for_success
Newton AI @PhysicalAI

https://video.twimg.com/amplify_video/1847003664487202816/vid/avc1/1280x720/WAPRMQ_3-lzdsJqg.mp4

3/3
@ImMr_Wise
How do we replicate the dataset is all I'm thinking

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 19, 2024

Microsoft’s Differential Transformer cancels attention noise in LLMs

A simple change to the attention mechanism can make LLMs much more effective at finding relevant information in their context window.

venturebeat.com

Microsoft’s Differential Transformer cancels attention noise in LLMs

Ben dikkson@BenDee983

October 16, 2024 1:16 PM

Image credit: VentureBeat with DALL-E 3

Improving the capabilities of large language models (LLMs) in retrieving in-prompt information remains an area of active research that can impact important applications such as retrieval-augmented generation (RAG) and in-context learning (ICL).

Microsoft Research and Tsinghua University researchers have introduced Differential Transformer (Diff Transformer), a new LLM architecture that improves performance by amplifying attention to relevant context while filtering out noise. Their findings, published in a research paper, show that Diff Transformer outperforms the classic Transformer architecture in various settings.

Transformers and the “lost-in-the-middle” phenomenon

The Transformer architecture is the foundation of most modern LLMs. It uses an attention mechanism to weigh the importance of different parts of the input sequence when generating output. The attention mechanism employs the softmax function, which normalizes a vector of values into a probability distribution. In Transformers, the softmax function assigns attention scores to different tokens in the input sequence.

However, studies have shown that Transformers struggle to retrieve key information from long contexts.

“We began by investigating the so-called ‘lost-in-the-middle’ phenomenon,” Furu Wei, Partner Research Manager at Microsoft Research, told VentureBeat, referring to previous research findings that showed that LLMs “do not robustly make use of information in long input contexts” and that “performance significantly degrades when models must access relevant information in the middle of long contexts.”

Wei and his colleagues also observed that some LLM hallucinations, where the model produces incorrect outputs despite having relevant context information, correlate with spurious attention patterns.

“For example, large language models are easily distracted by context,” Wei said. “We analyzed the attention patterns and found that the Transformer attention tends to over-attend irrelevant context because of the softmax bottleneck.”

The softmax function used in Transformer’s attention mechanism tends to distribute attention scores across all tokens, even those that are not relevant to the task. This can cause the model to lose focus on the most important parts of the input, especially in long contexts.

“Previous studies indicate that the softmax attention has a bias to learn low-frequency signals because the softmax attention scores are restricted to positive values and have to be summed to 1,” Wei said. “The theoretical bottleneck renders [it] such that the classic Transformer cannot learn sparse attention distributions. In other words, the attention scores tend to flatten rather than focusing on relevant context.”

Differential Transformer

Differential Transformer (source: arXiv)

To address this limitation, the researchers developed the Diff Transformer, a new foundation architecture for LLMs. The core idea is to use a “differential attention” mechanism that cancels out noise and amplifies the attention given to the most relevant parts of the input.

The Transformer uses three vectors to compute attention: query, key, and value. The classic attention mechanism performs the softmax function on the entire query and key vectors.

The proposed differential attention works by partitioning the query and key vectors into two groups and computing two separate softmax attention maps. The difference between these two maps is then used as the attention score. This process eliminates common noise, encouraging the model to focus on information that is pertinent to the input.

The researchers compare their approach to noise-canceling headphones or differential amplifiers in electrical engineering, where the difference between two signals cancels out common-mode noise.

While Diff Transformer involves an additional subtraction operation compared to the classic Transformer, it maintains efficiency thanks to parallelization and optimization techniques.

“In the experimental setup, we matched the number of parameters and FLOPs with Transformers,” Wei said. “Because the basic operator is still softmax, it can also benefit from the widely used FlashAttention cuda kernels for acceleration.”

In retrospect, the method used in Diff Transformer seems like a simple and intuitive solution. Wei compares it to ResNet, a popular deep learning architecture that introduced “residual connections” to improve the training of very deep neural networks. Residual connections made a very simple change to the traditional architecture yet had a profound impact.

“In research, the key is to figure out ‘what is the right problem?’” Wei said. “Once we can ask the right question, the solution is often intuitive. Similar to ResNet, the residual connection is an addition, compared with the subtraction in Diff Transformer, so it wasn’t immediately apparent for researchers to propose the idea.”

Diff Transformer in action

The researchers evaluated Diff Transformer on various language modeling tasks, scaling it up in terms of model size (from 3 billion to 13 billion parameters), training tokens, and context length (up to 64,000 tokens).

Their experiments showed that Diff Transformer consistently outperforms the classic Transformer architecture across different benchmarks. A 3-billion-parameter Diff Transformer trained on 1 trillion tokens showed consistent improvements of several percentage points compared to similarly sized Transformer models.

Further experiments with different model sizes and training dataset sizes confirmed the scalability of Diff Transformer. Their findings suggest that in general, Diff Transformer requires only around 65% of the model size or training tokens needed by a classic Transformer to achieve comparable performance.

The Diff Transformer is more efficient than the classic Transformer in terms of both parameters and train tokens (source: arXiv)

The researchers also found that Diff Transformer is particularly effective in using increasing context lengths. It showed significant improvements in key information retrieval, hallucination mitigation, and in-context learning.

While the initial results are promising, there’s still room for improvement. The research team is working on scaling Diff Transformer to larger model sizes and training datasets. They also plan to extend it to other modalities, including image, audio, video, and multimodal data.

The researchers have released the code for Diff Transformer, implemented with different attention and optimization mechanisms. They believe the architecture can help improve performance across various LLM applications.

“As the model can attend to relevant context more accurately, it is expected that these language models can better understand the context information with less in-context hallucinations,” Wei said. “For example, for the retrieval-augmented generation settings (such as Bing Chat, Perplexity, and customized models for specific domains or industries), the models can generate more accurate responses by conditioning on the retrieved documents.”

bnew · Oct 19, 2024

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da

The move to further differentiate Pika 1.5 from competitors Runway, Luma, Kling, and Hailuo comes amid intensifying competition

venturebeat.com

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da

Carl Franzen@carlfranzen

October 16, 2024 2:42 PM

Credit: Pika AI

Pika a.k.a Pika Labs or Pika AI, the Palo Alto, California-based startup that has raised $55 million to disrupt video production with its video AI models of the same name, is further expanding the free special effects users can access through its web-based AI image-to-video generator.

Pika 1.5, its latest AI video model, now includes the ability to crumble, dissolve, deflate and “ta-da” video subjects — the last of these essentially making a video subject disappear behind a cloth.

Users can simply upload an image to the site and Pika 1.5 will turn it into a video with a corresponding animation. The user guides which animation is used by selecting it from a button beside the “Image” attachment icon (paperclip) labeled “Pikaeffect” with a magic wand beside it.

Screenshot-2024-10-16-at-5.24.13%E2%80%AFPM-1.png

The new AI powered special effects — or “Pikaffects, in the company’s parlance — join six others previously unveiled earlier this month: Explode, squish, melt, crush, inflate and “cake-ify,” the latter of which turns any uploaded still image into an “is it cake?” video where the answer is a resounding “yes!”

Unfortunately, VentureBeat has been unable to use the new effects yet as when we attempted, the site said “We’re experiencing high demand right now (how flattering)!”

Nonetheless, as the AI landscape evolves, Pika’s unique approach to video manipulation sets it apart from the growing field of AI-driven content generation.

While Pikaffects cater to users seeking creative transformations, traditional features like lip-syncing and AI sound effects remain accessible on the earlier Pika 1.0 model. Paid subscribers have the flexibility to switch between Pika 1.5 and 1.0, depending on their project needs.

Where Pika came from

Pika Labs, co-founded by former Stanford AI researchers Demi Guo and Chenlin Meng, first launched its AI video platform in late 2023. The company has rapidly scaled, reaching over half a million users in less than a year.

Unlike many AI video platforms that focus primarily on realism, Pika takes a different route by prioritizing creative manipulation.

These effects enable users to reshape video subjects in ways that are not just visually impactful but also technologically intriguing, offering hands-on AI practitioners a sandbox for experimentation.

For professionals managing machine learning models or integrating new AI tools, Pika Labs’ latest features could present new opportunities to deploy innovative content solutions.

The platform allows the quick application of effects through a user-friendly interface while still enabling deeper integration via text-to-video (T2V) and image-to-video (I2V) workflows.

Subscription pricing

To accommodate a diverse range of users, Pika Labs offers four subscription plans:

Basic (Free): This entry-level plan provides 150 monthly video credits and access to the Pika 1.5 features, making it suitable for casual users or those curious about the platform.
Standard ($8/month, billed yearly): With 700 monthly credits, access to both Pika 1.5 and Pika 1.0, and faster generation times, this plan offers more flexibility for content creators looking to produce more videos.
Pro ($28/month, billed yearly): This plan includes 2,000 monthly credits and even faster generation times, catering to users with higher content demands.
Unlimited ($76/month, billed yearly): Designed for power users, this plan allows unlimited video credits, offering the fastest generation times available on the platform.

The updated credit structure (15 credits per five-second clip) allows for a scalable approach to video generation. The various subscription tiers accommodate different needs, from light experimentation to intensive production, ensuring that both individual contributors and larger teams can find an affordable solution.

These flexible pricing options make Pika Labs accessible to smaller teams and larger organizations alike, allowing AI engineers to manage costs while experimenting with new video capabilities.

Attempting to differentiate amid a crowded sea of competitors

The move by Pika to further differentiate its video AI model from competitors such as Runway, Luma, Kling, and Hailuo comes amid intensifying competition in the nascent industry, and follows Adobe’s move this week at its MAX conference in Miami Beach, Florida, to begin offering a preview of its own “enterprise safe” AI video model Firefly Video, trained on licensed data.

Pika, like most other generative AI startups, has not disclosed its precise training dataset. Other rivals such as Runway have been sued by artists for alleged copyright infringement over training AI models on data scraped from the web, including many other artworks and videos, and likely many copyrighted ones. That case, which also names AI image generator Midjourney and Stability, is moving forward toward a trial but has yet to be decided.

bnew · Oct 19, 2024

Mistral AI’s new language models bring AI power to your phone and laptop

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Mistral AI, a rising star in the artificial intelligence arena, launched two new language models on Wednesday, potentially reshaping how businesses and developers deploy...

venturebeat.com

Mistral AI’s new language models bring AI power to your phone and laptop

Michael Nuñez @MichaelFNunez

October 16, 2024 10:47 AM

Credit: VentureBeat made with Midjourney

Mistral AI, a rising star in the artificial intelligence arena, launched two new language models on Wednesday, potentially reshaping how businesses and developers deploy AI technology.

The Paris-based startup’s new offerings, Ministral 3B and Ministral 8B, are designed to bring powerful AI capabilities to edge devices, marking a significant shift from the cloud-centric approach that has dominated the industry.

These compact models, collectively dubbed “les Ministraux,” are surprisingly capable despite their small size. Ministral 3B, with just 3 billion parameters, outperforms Mistral’s original 7 billion parameter model on most benchmarks. Its larger sibling, Ministral 8B, boasts performance rivaling models several times its size.

Performance comparison of AI language models across various benchmarks. Mistral AI’s new Ministral 3B and 8B models (highlighted in bold) show competitive results against larger models from Google (Gemma) and Meta (Llama), particularly in knowledge, commonsense, and multilingual tasks. Higher scores indicate better performance. (Credit: Mistral)

Edge AI: Bringing intelligence closer to users

The significance of this release extends far beyond technical specifications. By enabling AI to run efficiently on smartphones, laptops, and IoT devices, Mistral is opening doors to applications previously considered impractical due to connectivity or privacy constraints.

This shift towards edge computing could make advanced AI capabilities more accessible, bringing them closer to end-users and addressing privacy concerns associated with cloud-based solutions.

Consider a scenario where a factory robot needs to make split-second decisions based on visual input. Traditionally, this would require sending data to a cloud server for processing, introducing latency and potential security risks. With Ministral models, the AI can run directly on the robot, enabling real-time decision-making without external dependencies.

This edge-first approach also has profound implications for personal privacy. Running AI models locally on devices means sensitive data never leaves the user’s possession.

This could significantly impact applications in healthcare, finance, and other sectors where data privacy is paramount. It represents a fundamental shift in how we think about AI deployment, potentially alleviating concerns about data breaches and unauthorized access that have plagued cloud-based systems.

Comparative performance of AI language models across key benchmarks. Mistral AI’s new Ministral 3B and 8B models (in orange) demonstrate competitive or superior accuracy compared to larger models from Google (Gemma) and Meta (Llama), particularly in multilingual capabilities and knowledge tasks. The chart illustrates the potential of more compact models to rival their larger counterparts. (Credit: Mistral)

Balancing efficiency and environmental impact

Mistral’s timing aligns with growing concerns about AI’s environmental impact. Large language models typically require significant computational resources, contributing to increased energy consumption.

By offering more efficient alternatives, Mistral is positioning itself as an environmentally conscious choice in the AI market. This move aligns with a broader industry trend towards sustainable computing, potentially influencing how companies approach their AI strategies in the face of growing climate concerns.

The company’s business model is equally noteworthy. While making Ministral 8B available for research purposes, Mistral is offering both models through its cloud platform for commercial use.

This hybrid approach mirrors successful strategies in the open-source software world, fostering community engagement while maintaining revenue streams.

By nurturing a developer ecosystem around their models, Mistral is creating a robust foundation against larger competitors, a strategy that has proven effective for companies like Red Hat in the Linux space.

Navigating challenges in a competitive landscape

The AI landscape is becoming increasingly crowded. Tech giants like Google and Meta have released their own compact models, while OpenAI continues to dominate headlines with its GPT series.

Mistral’s focus on edge computing could carve out a distinct niche in this competitive field. The company’s approach suggests a future where AI is not just a cloud-based service, but an integral part of every device, fundamentally changing how we interact with technology.

However, challenges remain. Deploying AI at the edge introduces new complexities in model management, version control, and security. Enterprises will need robust tooling and support to effectively manage a fleet of edge AI devices.

This shift could spawn an entirely new industry focused on edge AI management and security, similar to how the rise of cloud computing gave birth to a plethora of cloud management startups.

Mistral seems aware of these challenges. The company is positioning its new models as complementary to larger, cloud-based systems. This approach allows for flexible architectures where edge devices handle routine tasks, while more complex queries are routed to more powerful models in the cloud. It’s a pragmatic strategy that acknowledges the current limitations of edge computing while still pushing the boundaries of what’s possible.

The technical innovations behind les Ministraux are equally impressive. Ministral 8B employs a novel “interleaved sliding-window attention” mechanism, allowing it to process long sequences of text more efficiently than traditional models.

Both models support context lengths of up to 128,000 tokens, translating to about 100 pages of text—a feature that could be particularly useful for document analysis and summarization tasks. These advancements represent a leap forward in making large language models more accessible and practical for everyday use.

As businesses grapple with the implications of this technology, several key questions emerge. How will edge AI impact existing cloud infrastructure investments? What new applications will become possible with always-available, privacy-preserving AI? How will regulatory frameworks adapt to a world where AI processing is decentralized? The answers to these questions will likely shape the trajectory of the AI industry in the coming years.

Mistral’s release of compact, high-performing AI models signals more than just a technical evolution—it’s a bold reimagining of how AI will function in the very near future.

This move could disrupt traditional cloud-based AI infrastructures, forcing tech giants to rethink their dependence on centralized systems. The real question is: in a world where AI is everywhere, will the cloud still matter?

bnew · Oct 19, 2024

Anthropic just made it harder for AI to go rogue with its updated safety policy

Anthropic updates its Responsible Scaling Policy, introducing new safety standards and AI capability thresholds to manage risks from powerful AI models like autonomous systems and bioweapons threats.

venturebeat.com

Anthropic just made it harder for AI to go rogue with its updated safety policy

Michael Nuñez@MichaelFNunez

October 15, 2024 12:16 PM

Credit: VentureBeat made with Midjourney

Anthropic, the artificial intelligence company behind the popular Claude chatbot, today announced a sweeping update to its Responsible Scaling Policy (RSP), aimed at mitigating the risks of highly capable AI systems.

The policy, originally introduced in 2023, has evolved with new protocols to ensure that AI models, as they grow more powerful, are developed and deployed safely.

This revised policy sets out specific Capability Thresholds—benchmarks that indicate when an AI model’s abilities have reached a point where additional safeguards are necessary.

The thresholds cover high-risk areas such as bioweapons creation and autonomous AI research, reflecting Anthropic’s commitment to prevent misuse of its technology. The update also brings more detailed responsibilities for the Responsible Scaling Officer, a role Anthropic will maintain to oversee compliance and ensure that the appropriate safeguards are in place.

Anthropic’s proactive approach signals a growing awareness within the AI industry of the need to balance rapid innovation with robust safety standards. With AI capabilities accelerating, the stakes have never been higher.

Why Anthropic’s Responsible Scaling Policy matters for AI risk management

Anthropic’s updated Responsible Scaling Policy arrives at a critical juncture for the AI industry, where the line between beneficial and harmful AI applications is becoming increasingly thin.

The company’s decision to formalize Capability Thresholds with corresponding Required Safeguards shows a clear intent to prevent AI models from causing large-scale harm, whether through malicious use or unintended consequences.

The policy’s focus on Chemical, Biological, Radiological, and Nuclear (CBRN) weapons and Autonomous AI Research and Development (AI R&D) highlights areas where frontier AI models could be exploited by bad actors or inadvertently accelerate dangerous advancements.

These thresholds act as early-warning systems, ensuring that once an AI model demonstrates risky capabilities, it triggers a higher level of scrutiny and safety measures before deployment.

This approach sets a new standard in AI governance, creating a framework that not only addresses today’s risks but also anticipates future threats as AI systems continue to evolve in both power and complexity.

How Anthropic’s

Anthropic’s policy is more than an internal governance system—it’s designed to be a blueprint for the broader AI industry. The company hopes its policy will be “exportable,” meaning it could inspire other AI developers to adopt similar safety frameworks. By introducing AI Safety Levels (ASLs) modeled after the U.S. government’s biosafety standards, Anthropic is setting a precedent for how AI companies can systematically manage risk.

The tiered ASL system, which ranges from ASL-2 (current safety standards) to ASL-3 (stricter protections for riskier models), creates a structured approach to scaling AI development. For example, if a model shows signs of dangerous autonomous capabilities, it would automatically move to ASL-3, requiring more rigorous red-teaming (simulated adversarial testing) and third-party audits before it can be deployed.

If adopted industry-wide, this system could create what Anthropic has called a “race to the top” for AI safety, where companies compete not only on the performance of their models but also on the strength of their safeguards. This could be transformative for an industry that has so far been reluctant to self-regulate at this level of detail.

The role of the responsible scaling officer in AI risk governance

A key feature of Anthropic’s updated policy is the expanded responsibilities of the Responsible Scaling Officer (RSO)—a role that Anthropic will continue to maintain from the original version of the policy. The updated policy now details the RSO’s duties, which include overseeing the company’s AI safety protocols, evaluating when AI models cross Capability Thresholds, and reviewing decisions on model deployment.

This internal governance mechanism adds another layer of accountability to Anthropic’s operations, ensuring that the company’s safety commitments are not just theoretical but actively enforced. The RSO has the authority to pause AI training or deployment if the safeguards required at ASL-3 or higher are not in place.

In an industry moving at breakneck speed, this level of oversight could become a model for other AI companies, particularly those working on frontier AI systems with the potential to cause significant harm if misused.

Why Anthropic’s policy update is a timely response to growing AI regulation

Anthropic’s updated policy comes at a time when the AI industry is under increasing pressure from regulators and policymakers. Governments across the U.S. and Europe are debating how to regulate powerful AI systems, and companies like Anthropic are being watched closely for their role in shaping the future of AI governance.

The Capability Thresholds introduced in this policy could serve as a prototype for future government regulations, offering a clear framework for when AI models should be subject to stricter controls. By committing to public disclosures of Capability Reports and Safeguard Assessments, Anthropic is positioning itself as a leader in AI transparency—an issue that many critics of the industry have highlighted as lacking.

This willingness to share internal safety practices could help bridge the gap between AI developers and regulators, providing a roadmap for what responsible AI governance could look like at scale.

Looking ahead: What Anthropic’s Responsible Scaling Policy means for the future of AI development

As AI models become more powerful, the risks they pose will inevitably grow. Anthropic’s updated Responsible Scaling Policy is a forward-looking response to these risks, creating a dynamic framework that can evolve alongside AI technology. The company’s focus on iterative safety measures—with regular updates to its Capability Thresholds and Safeguards—ensures that it can adapt to new challenges as they arise.

While the policy is currently specific to Anthropic, its broader implications for the AI industry are clear. As more companies follow suit, we could see the emergence of a new standard for AI safety, one that balances innovation with the need for rigorous risk management.

In the end, Anthropic’s Responsible Scaling Policy is not just about preventing catastrophe—it’s about ensuring that AI can fulfill its promise of transforming industries and improving lives without leaving destruction in its wake.

bnew · Oct 19, 2024

1/11
@Hesamation
Google Deepmind trained a grandmaster-level transformer chess player that achieves 2895 ELO,

even on chess puzzles it has never seen before,
with zero planning,
by only predicting the next best move,

if a guy told you "llms don't work on unseen data", just walk away

2/11
@Hesamation
this is not a new paper btw, it's from Feb 2024 (and their network isn't an llm exactly but a transformer specifically for this task)

what many people in academia or non-technical "experts" mean when they say llms cannot produce new data, is absurd. they don't realize that an optimal AI learns patterns, it doesn't memorize

so can't take an llm trained on English language and ask it to produce Shekspear in Chinese, but you could ask it to write Shekspear-style Eminem lyrics

so of course the model must have some knowledge of chess, as a human chess grandmaster spends a lifetime to do this, but it doesn't act dumb when confronted with a game state it hasn't seen before

and the same thing could be said about "creativity"

3/11
@Hesamation
people are seriously complaining the results aren't fair since this was trained on Chess games :smile:

what did you expect?!

4/11
@okpatil4u
Dude, it’s chess. There are always only a few valid moves.

5/11
@Hesamation
Only a few, roughly 10^120

6/11
@hypoparams
"...we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games... leading to roughly 15 billion data points."

did you neglect to highlight this on purpose or by mistake?

7/11
@Hesamation
Well the model didnt drop from the sky. It’s in the screenshot. Of course it is trained on games, thats not the highlight of the paper

Zero explicit learning of the game rules
Solving chess puzzles not in the data

If you think 10 million chess games is remotely close to the number of possible states, think again

8/11
@findmyke
> "LLMs don't work on unseen data" just walk away

To be fair, this isn't an LLM, but something more akin to an "expert system," which is old skool

That said, I'd still walk away.

9/11
@Hesamation
it isn't an llm, it's a transformer,
the architecture behind llm

10/11
@victor_explore
i wonder what other games or tasks this approach could revolutionize?

11/11
@Hesamation
it's not a specific approach, but it's impressive that it shows what could be done

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
"duh, there are only a few next valid moves in Chess"
tell that to an aspiring chess grandmaster who has devoted a lifetime to studying thousands of games

people are willing to make Chess seem like a stupid game than to admit AI could be smart

[Quoted tweet]
Google Deepmind trained a grandmaster-level transformer chess player that achieves 2895 ELO,

even on chess puzzles it has never seen before,
with zero planning,
by only predicting the next best move,

if a guy told you "llms don't work on unseen data", just walk away

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Oct 20, 2024

1/12
@minchoi
HeyGen just dropped Avatar 3.0 with Unlimited Looks.

Now anyone can clone themselves with AI and unlock multiple poses, outfits, and camera angles.

Here's how:

https://video.twimg.com/ext_tw_video/1843391162154909696/pu/vid/avc1/1280x720/vjOC27CxtaYlcsNT.mp4

2/12
@minchoi
1/ Create your Avatar

- Go to HeyGen
- Click on Avatars then Create Avatar
- Follow the given step by step instructions

It's very important to follow the instructions on making a good video of yourself

https://video.twimg.com/ext_tw_video/1843391974348369920/pu/vid/avc1/1402x720/u1ApPmZvnF9nKIKv.mp4

3/12
@minchoi
2/ Create your Video

- Click Create Video
- Click Avatar Video
- Choose video orientation
- Select your Avatar (you can add more Scenes for different Looks)
- Add/Generate your script
- Then Click Submit

https://video.twimg.com/ext_tw_video/1843392144632889344/pu/vid/avc1/1402x720/xFXBfdxUpFtB66hu.mp4

4/12
@minchoi
3/ Wait for your Video to be generated

- Depending on the length of your video, it will take several minutes to process
- And you are done

https://video.twimg.com/ext_tw_video/1843392877172281344/pu/vid/avc1/1402x720/knyVqqheSTn9iaVk.mp4

5/12
@minchoi
Official video from @HeyGen_Official & @joshua_xu_

Unlock Unlimited looks, unlimited creativity at HeyGen - AI Video Generator

https://video.twimg.com/ext_tw_video/1841433281515892737/pu/vid/avc1/1920x1080/mzMZa8aPOgAp1Pm9.mp4

6/12
@minchoi
If you enjoyed this thread,

Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

[Quoted tweet]
HeyGen just dropped Avatar 3.0 with Unlimited Looks.

Now anyone can clone themselves with AI and unlock multiple poses, outfits, and camera angles.

Here's how:

https://video.twimg.com/ext_tw_video/1843391162154909696/pu/vid/avc1/1280x720/vjOC27CxtaYlcsNT.mp4

7/12
@KanikaBK
Woah, awesome

8/12
@minchoi
Awesome indeed

9/12
@adolfoasorlin
This is amazing dude

10/12
@minchoi
Lip syncing on HeyGen is another level

11/12
@mhdfaran
With all these unlimited looks and poses, I bet social media influencers are already gearing up to flood our feeds with even more content.

12/12
@minchoi
Genuine human content will become more important than ever

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Create Realistic AI Video Avatars in Minutes | HeyGen

Produce lifelike AI video avatars 3–4x faster with 60% higher performance. Choose from 500+ avatar styles for training, marketing, and global outreach.

www.heygen.com

bnew · Oct 20, 2024

bnew · Oct 20, 2024

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Supported Models​

Fine-tuning LLMs to 1.58bit: extreme quantization made easy​

Veteran

Veteran

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results​

Nvidia’s AI gambit: From GPU powerhouse to language model pioneer​

How Nvidia’s new model could reshape business and research​

The AI arms race heats up: Nvidia’s bold move challenges tech giants​

Veteran

Archetype AI’s Newton model learns physics from raw data—without any help from humans​

From pendulums to power grids: AI’s uncanny predictive powers​

Adapting AI for industrial applications​

Expanding human perception: AI as a new sense​

Veteran

OpenAI just launched ChatGPT for Windows—and it’s coming for your office software​

OpenAI’s desktop strategy: More than just convenience​

Enterprise ambitions: ChatGPT as the new office suite?​

Balancing Act: Innovation, ethics, and commercialization​

Veteran

Veteran

Microsoft’s Differential Transformer cancels attention noise in LLMs​

Transformers and the “lost-in-the-middle” phenomenon​

Differential Transformer​

Diff Transformer in action​

Veteran

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da​

Where Pika came from​

Subscription pricing​

Attempting to differentiate amid a crowded sea of competitors​

Veteran

​

Mistral AI’s new language models bring AI power to your phone and laptop​

​

Edge AI: Bringing intelligence closer to users​

​

Balancing efficiency and environmental impact​

​

Navigating challenges in a competitive landscape​

Veteran

Anthropic just made it harder for AI to go rogue with its updated safety policy​

Why Anthropic’s Responsible Scaling Policy matters for AI risk management​

How Anthropic’s ​

The role of the responsible scaling officer in AI risk governance​

Why Anthropic’s policy update is a timely response to growing AI regulation​

Looking ahead: What Anthropic’s Responsible Scaling Policy means for the future of AI development​

Veteran

Veteran

Veteran

Veteran

Supported Models

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4—no big launch, just big results

Nvidia’s AI gambit: From GPU powerhouse to language model pioneer

How Nvidia’s new model could reshape business and research

The AI arms race heats up: Nvidia’s bold move challenges tech giants

Archetype AI’s Newton model learns physics from raw data—without any help from humans

From pendulums to power grids: AI’s uncanny predictive powers

Adapting AI for industrial applications

Expanding human perception: AI as a new sense

OpenAI just launched ChatGPT for Windows—and it’s coming for your office software

OpenAI’s desktop strategy: More than just convenience

Enterprise ambitions: ChatGPT as the new office suite?

Balancing Act: Innovation, ethics, and commercialization

Microsoft’s Differential Transformer cancels attention noise in LLMs

Transformers and the “lost-in-the-middle” phenomenon

Differential Transformer

Diff Transformer in action

Pika 1.5 updates again to add even more AI video Pikaffects: crumble, dissolve, deflate, ta-da

Where Pika came from

Subscription pricing

Attempting to differentiate amid a crowded sea of competitors

Mistral AI’s new language models bring AI power to your phone and laptop

Edge AI: Bringing intelligence closer to users

Balancing efficiency and environmental impact

Navigating challenges in a competitive landscape

Anthropic just made it harder for AI to go rogue with its updated safety policy

Why Anthropic’s Responsible Scaling Policy matters for AI risk management

How Anthropic’s

The role of the responsible scaling officer in AI risk governance

Why Anthropic’s policy update is a timely response to growing AI regulation

Looking ahead: What Anthropic’s Responsible Scaling Policy means for the future of AI development