bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153

1/1
@HaHoang411
🌟 Mind-blowing work by the team at @FLAIR_Ox! They've created Kinetix, a framework for training general-purpose RL agents that can tackle physics-based challenges.
The coolest part? Their agents can solve physical reasoning complex tasks zero-shot!
πŸ₯³Congrats @mitrma and team.

[Quoted tweet]
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!

We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/🧡


https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196














1/12
@mitrma
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL!

We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments.
1/🧡



https://video.twimg.com/ext_tw_video/1856003600159256576/pu/vid/avc1/1280x720/zJNdBD1Yq0uFl9Nf.mp4

2/12
@mitrma
πŸ‘Ύ Kinetix can represent problems ranging from robotic locomotion and grasping, to classic RL environments and video games, all within a unified framework. This opens the door to training a single generalist agent for all these tasks!
2/



https://video.twimg.com/ext_tw_video/1856003839851220992/pu/vid/avc1/640x640/J_w1M8wm8ibiGCAn.mp4

3/12
@mitrma
🎲 By procedurally generating random environments, we train an RL agent that can zero-shot solve unseen handmade problems. This includes some where RL from scratch fails!
3/



https://video.twimg.com/ext_tw_video/1856003979878051840/pu/vid/avc1/720x720/JAcE26Hprn1NXPvU.mp4

4/12
@mitrma
🟩 🟦 πŸŸ₯ Each environment has the same goal: make 🟩 touch 🟦 while preventing 🟩 touching πŸŸ₯. The agent controls all motors and thrusters.

In this task the car has to first be flipped with thrusters. The general agent solves it zero-shot, having never seen it before.
4/



https://video.twimg.com/ext_tw_video/1856004286943002624/pu/vid/avc1/720x720/hjhITONkJiDY9tD2.mp4

5/12
@mitrma
πŸš— Our general agent shows emergent physical reasoning capabilities, for instance being able to zero-shot control unseen morphologies by moving them underneath a goal (πŸ”΅).
5/



https://video.twimg.com/ext_tw_video/1856004409559306241/pu/vid/avc1/994x540/AA6c6MHpWRkFt3OJ.mp4

6/12
@mitrma
πŸš€ We also show that finetuning this general model on target tasks is more sample efficient than training from scratch, providing a step towards a foundation model for RL.

In some cases, training from scratch completely fails, while our finetuned general model succeeds πŸ‘‡
6/



https://video.twimg.com/ext_tw_video/1856004545525972993/pu/vid/avc1/1280x720/jMqgYcCwx-q4tSpm.mp4

7/12
@mitrma
πŸ“ˆ One big takeaway from this work is the importance of autocurricula. In particular, we found significantly improved results by dynamically prioritising levels with high 'learnability'.
7/



GcHacg4WUAAmHTp.jpg


8/12
@mitrma
🍎 The core of Kinetix is our new 2D rigid body physics engine: Jax2D. This is a minimal rewrite of the classic Box2D engine made by @erin_catto. Jax2D allows us to run thousands of heterogeneous parallel environments on a single GPU (yes, you can vmap over different tasks!)
8/



9/12
@mitrma
πŸ”§ Don't take our word for it, try it out for yourself!
Create your own levels in your browser with Kinetix.js and see how different pretrained agents perform: Redirecting...
9/



https://video.twimg.com/ext_tw_video/1856004915501350912/pu/vid/avc1/1422x720/7wj1y_BcHHUnNtwx.mp4

10/12
@mitrma
This work was co-led with @mcbeukman and done at @FLAIR_Ox with @_chris_lu_ and @j_foerst.
Blog: https://kinetix-env.github.io/
GitHub: GitHub - FLAIROx/Kinetix: Reinforcement learning on general 2D physics environments in JAX
arXiv: [2410.23208] Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
end/



11/12
@_k_sridhar
Very cool paper! FYI, we recently pretrained a generalist agent that can generalize to unseen atari/metaworld/mujoco/procgen environments simply via retrieval-augmentation and in-context learning. Our work uses an imitation learning approach. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.



12/12
@mitrma
This is really cool! Let's meet up and chat at ICLR if we both end up going?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153







1/11
@tanishqkumar07
[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training https://arxiv.org/pdf/2411.04330. TLDR;

- Models become harder to post-train quantize as they are overtrained on lots of data, so that eventually more pretraining data can be actively harmful if quantizing post-training!
- The effects of putting weights, activations, or attention in varying precisions during pretraining are consistent and predictable, and fitting a scaling law suggests that pretraining at high (BF16) and next-generation (FP4) precisions may both be suboptimal design choices!

Joint work with @ZackAnkner @bfspector @blake__bordelon @Muennighoff @mansiege @CPehlevan @HazyResearch @AdtRaghunathan.



GcH1RBoWwAAQp1q.jpg


2/11
@tanishqkumar07
[2/7] We first study the common technique of post-train quantizing model weights, finding that the longer you train/the more data seen during pretraining, the more sensitive the model becomes to quantization at inference-time, explaining why Llama-3 may be harder to quantize.
In fact, this loss degradation is roughly a power law in the token/parameter ratio seen during pretraining, so that you can predict in advance the critical data size beyond which pretraining on more data is actively harmful if you're serving a quantized model. The intuition might be that as more knowledge is compressed into weights as you train on more data, a given perturbation will damage performance more.
Below is a fixed language model overtrained significantly to various data budgets up to 30B tokens, then post-train quantized afterwards. This demonstrates how more pretraining FLOPs do not always lead to better models served in production.



GcH9H_xXMAAwP__.jpg


3/11
@tanishqkumar07
[3/7] We then turn our attention to training in low precision. We study both quantization-aware training (weights only) and low-precision training (everything in low precision). We decompose the model into weights, activations, and KV cache, finding scaling laws for loss when any of these are quantized to any precision, and develop a compositional and interpretable functional form to predict the effect on loss of quantizing any combination of the three during pretraining.



4/11
@tanishqkumar07
[4/7] Our scaling law relies on a notion of "effective parameter count" which we posit is the quantity that is reduced when you lower precision at a fixed number of real parameters, so that a 1 billion parameter model with everything trained in FP4 has a comparable number of "effective parameters" to a 250m model in BF16.

While weights can be trained in low precision without issue, activations and KV cache are sensitive. Below is the normalized "effective parameter count" as a function of precision for each of the (weights, activations, KV cache) as well as when they are all held to the same precision (tied) based on our fits.



GcH7mwpXEAAKOo5.jpg


5/11
@tanishqkumar07
[5/7] Finally, we are able to unify our findings for pre- and post-training into an interpretable functional form that predicts loss from pre- and post-training in any combination of precision. We find that pretraining in low precision "robustifies" a model to post-train quantization in a quantitatively predictable way, but by less than you would intuitively expect, for reasons we outline and test in the paper.



6/11
@tanishqkumar07
[6/7] Our work has several limitations -- we keep a controlled architecture and setup when doing experiments, but in practice architectural tweaks are often deliberately made to accommodate low-precision training. We also fit scaling laws on relatively small language models (up to ~250m) because we train over 450 models on large data budgets (up to over 25b tokens). We are excited for future work to study these effects at larger model scale!



GcH8wvFXsAAGJQX.jpg


7/11
@tanishqkumar07
[7/7] Many thanks to @Tim_Dettmers @chrismdesa @realDanFu for super helpful feedback as well as to the entire @HazyResearch team for their support! Models from our 465+ pretraining runs will soon be on HuggingFace for everyone to play around with, and code will also be released! The preprint is at https://arxiv.org/pdf/2411.04330



8/11
@matdmiller
@threadreaderapp unroll



9/11
@threadreaderapp
@matdmiller Hello, you can read it here: Thread by @Tanishq97836660 on Thread Reader App Have a good day. πŸ€–



10/11
@itsclivetime
insanely cool work! :D embrace the scaling laws

will be cool to see how sota quantization schemes (mxfp, Pw!=Pkv!=Pa, etc) shift the frontier

also imho - spending half your compute budget on 1 big run to check that the fit generalizes to big models would be worth it



11/11
@omarsar0
Very interesting paper! The findings on the different precisions are really important. It's good to see papers like this that investigate these scaling laws more closely and from a different angle. Congrats!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



Essential Questions to Capture the Main Points and Core Meaning of the Text​

  1. What is the central theme or argument of the paper, and how does it relate to the quantization of large language models like Llama-3?
    • This question addresses the main idea of the paper, which is the study of scaling laws for precision in pre- and post-training of large language models and the challenges associated with quantizing these models.
  2. How does the amount of pretraining data affect the sensitivity of a model to quantization, and what are the implications for model performance?
    • This question highlights the key finding that longer training or more data seen during pretraining makes the model more sensitive to quantization at inference time, leading to potential performance degradation.
  3. What are the effects of quantizing different components (weights, activations, KV cache) of a model during pretraining, and how do these effects relate to the concept of "effective parameter count"?
    • This question delves into the detailed analysis of how different parts of the model respond to quantization and introduces the concept of "effective parameter count" as a way to understand these effects.
  4. How does pretraining in low precision impact the robustness of a model to post-train quantization, and what are the quantitative predictions from the study?
    • This question explores the findings on how pretraining in low precision can make a model more robust to post-train quantization and the quantitative aspects of these predictions.
  5. What are the limitations of the study, and what future directions are suggested for further research on this topic?
    • This question addresses the limitations mentioned in the paper, such as the controlled architecture and the scale of the models studied, and looks at potential future research directions.

Detailed Answers to the Generated Questions​

1. What is the central theme or argument of the paper, and how does it relate to the quantization of large language models like Llama-3?​

The central theme of the paper is the study of scaling laws for precision in both pre- and post-training of large language models. The authors investigate how the precision of model components (such as weights, activations, and attention mechanisms) during training affects the model's performance when quantized. Specifically, they find that models become harder to quantize post-training as they are overtrained on large amounts of data, which can lead to performance degradation. This is particularly relevant to models like Llama-3, where the paper suggests that overtraining can make these models more sensitive to quantization.

2. How does the amount of pretraining data affect the sensitivity of a model to quantization, and what are the implications for model performance?​

The amount of pretraining data significantly affects the sensitivity of a model to quantization. The study shows that the longer a model is trained or the more data it sees during pretraining, the more sensitive it becomes to quantization at inference time. This sensitivity follows a power law in the token/parameter ratio seen during pretraining, indicating that beyond a certain critical data size, additional pretraining data can be actively harmful if the model is to be served in a quantized form. This means that while more pretraining data generally improves model performance, it can also make the model more vulnerable to performance degradation when quantized.

3. What are the effects of quantizing different components (weights, activations, KV cache) of a model during pretraining, and how do these effects relate to the concept of "effective parameter count"?​

The study decomposes the model into weights, activations, and KV cache and examines the effects of quantizing each component. It finds that weights can be trained in low precision without significant issues, but activations and KV cache are more sensitive to quantization. The concept of "effective parameter count" is introduced to explain these effects; it posits that lowering precision reduces the effective number of parameters, even if the actual number of parameters remains the same. For example, a 1 billion parameter model trained in FP4 precision has an effective parameter count comparable to a 250 million parameter model trained in BF16 precision. This framework helps predict the loss associated with quantizing different components of the model.

4. How does pretraining in low precision impact the robustness of a model to post-train quantization, and what are the quantitative predictions from the study?​

Pretraining in low precision can "robustify" a model to post-train quantization, but this effect is less than one might intuitively expect. The study formulates an interpretable functional form that predicts the loss from both pre- and post-training in any combination of precision. This suggests that while pretraining in low precision can make a model more resilient to post-train quantization, the benefits are quantitatively predictable and not as significant as might be hoped. This finding helps in designing more efficient training strategies that balance precision and robustness.

5. What are the limitations of the study, and what future directions are suggested for further research on this topic?​

The study has several limitations. It was conducted with a controlled architecture and setup, which may not reflect real-world scenarios where architectural tweaks are often made to accommodate low-precision training. Additionally, the experiments were conducted on relatively small language models (up to ~250 million parameters) due to the large number of models trained over extensive data budgets. Future research directions include studying these effects at larger model scales and exploring how different architectural modifications impact the findings. The authors also suggest spending more compute budget on larger runs to verify the generalizability of their fits to bigger models.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153






1/6
@HaHoang411
@NousResearch has just released the Forge Reasoning model.

Try it free here: NOUS CHAT | Talk to Hermes



2/6
@HaHoang411
Blog annoucement: Introducing the Forge Reasoning API Beta and Nous Chat: An Evolution in LLM Inference - NOUS RESEARCH



3/6
@HaHoang411
It utilizes 3 test-time-compute architectures:
- Chain of Code
- Monte Carlo Tree Search
- Mixture of Agents



GcNsJrjW4AA9JvZ.jpg


4/6
@HaHoang411
Testing on an example in Arc-Challenge dataset from
@allen_ai



https://video.twimg.com/ext_tw_video/1856449554243022848/pu/vid/avc1/720x946/X2NDUvqkC9Dw7_9t.mp4

5/6
@HaHoang411
Testing on a NAPLEX (US Pharmacist License Exam). And yes the answer is perfect.



https://video.twimg.com/ext_tw_video/1856449735210524672/pu/vid/avc1/720x946/i_96J8Wt2bUEzZ-l.mp4

6/6
@HaHoang411
Testing with simple coding ability. Seems like it is not really optimized?



https://video.twimg.com/ext_tw_video/1856451124561117184/pu/vid/avc1/720x946/VGMijiyxb6Z764KL.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153


X Sues to Block California Election Deepfake Law β€˜In Conflict’ With First Amendment​


The lawsuit claims AB 2655 β€œwill inevitably result in the censorship of wide swaths of valuable political speech”

elon-musk-poll.jpg
(CREDIT: Slaven Vlasic/Getty Images for The New York Times)

Sean-Burch.png

Sean Burch

November 15, 2024 @ 8:06 AM

X sued to block a new California law that would require social media platforms to censor β€œmaterially deceptive content,” aka deepfakes, about politicians in the lead up to an election.

The company, owned by Elon Musk, claimed in its Thursday court filling the new law would trample the First Amendment, as well as Section 230 of the Communications Decency Act, which gives social platforms the broad legal immunity to moderate content however they see fit.

β€œIt is difficult to imagine a statute more in conflict with core First Amendment principles,” the lawsuit, filed in the Eastern District of California Federal Court, said.

The new law in question, Assembly Bill 2655, which has been dubbed the β€œDefending Democracy from Deepfake Deception Act of 2024,” would require platforms like X to remove β€œinauthentic, fake, or false” content of politicians 120 days before an election. Platforms would also have to β€œdevelop procedures” that allow California residents to file complaints about altered content, and would also require platforms to β€œlabel certain additional content inauthentic.”

AB 2655 is set to go into effect next year.

β€œThis system will inevitably result in the censorship of wide swaths of valuable political speech and commentary and will limit the type of β€˜uninhibited, robust, and wide-open’ β€˜debate on public issues’ that core First Amendment protections are designed to ensure,” X’s lawsuit said, while citing the 1964 case New York Times v. Sullivan case.

The new law, the lawsuit added, would impose β€œunintelligible prohibitions” on political speech,”greatly incentivizing covered platforms to censor all content that
could reasonably fall within the statute’s purview to avoid substantial enforcement costs.” This will β€œlead to censorship at the direction of the State,” the lawsuit said, due to AB2655’s β€œdraconian and one-sided” provisions.

If this sounds familiar, that’s because a California judge blocked a similar anti-deepfake law last month, two weeks after California Gov. Gavin Newsom signed it into law. U.S. District Judge John A. Mendez agreed with Musk that the law, which prohibited the β€œdistribution of materially deceptive audio or visual media of a candidate” within two months of an election, unless the post included a disclosure that the content was a deepfake, went too far.

Mendez said the law, AB2839, gave legislators β€œunbridled license to bulldoze over the longstanding tradition of critique, parody, and satire protected by the First Amendment.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153


Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close.​

Karl Freund

Contributor

Founder and Principal Analyst, Cambrian-AI Research LLC

Nov 18, 2024,02:00pm EST

The company tackled inferencing the Llama-3.1 405B foundation model and just crushed it. And for the crowds at SC24 this week in Atlanta, the company also announced it is 700 times faster than Frontier, the worlds fastest supercomputer, on a molecular dynamics simulation.

The Cerebras Wafer Scale Engine, effectively the world's largest and fastest chip.


The Cerebras Wafer Scale Engine, effectively the world's largest and fastest chip.

Cerebras Systems, Inc.

β€œThere is no supercomputer on earth, regardless of size, that can achieve this performance,” said Andrew Feldman, Co-Founder and CEO of the AI startup. As a result, scientist can now accomplish in a single day what it took two years of GPU-based supercomputer simulations to achieve.


Cerebras Inferencing Llama-3.1 405B​


When Cerebras announced its record-breaking performance on the 70 billion parameter Llama 3.1, it was quite a surprise; Cerebras had previously focussed on using its Wafer Scale Engine (WSE) on the more difficult training part of the AI workflow. The memory on a CS3 is fast on-chip SRAM instead of the larger (and 10x slower) High Bandwidth Memory used in data center GPUs. Consequently, the Cerebras CS3 provides 7,000x more memory bandwidth than the Nvidia H100, addressing Generative AI's fundamental technical challenge: memory bandwidth.

The leap in performance achieved by Cerebras is dramatic.


The leap in performance achieved by Cerebras is dramatic.

Cerebras Systems

And the latest result is just stupendous. Look at the charts above for a performance over time, and below to compare the competitive landscape for Llama 3.1-405B. The entire industry occupies the upper left quadrant of the chart, showing output speeds below the 100 tokens-per-second range for the Meta Llama 3.1-405B model. Cerebras produced some 970 tokens per second, all at roughly the same price as GPU and custom ASIC services like Samna N: 6 dollars per million input tokens and $12 dollars per million output tokens.

MORE FOR YOU

New Chrome, Safari, Firefox, Edge Warningβ€”Do Not Shop On These Websites​



Don’t Hold Down The Ctrl Keyβ€”New Warning As Cyber Attacks Confirmed​



Year’s Strongest Meteor Shower Has Begun: When To See The Geminids At Their Best​



Cerebras massively outperformed all other systems for Llama-3.1 405B.


Cerebras massively outperformed all other systems for Llama-3.1 405B.

Cerebras

Forbes Daily: Join over 1 million Forbes Daily subscribers and get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.

Compared to the competition, using 1000 input tokens, Cerebras embarrassed GPUs which all produced less than 100 tokens per second. Only SambaNova even came β€œclose” at 164 tokens. Now, as you know, there is no free lunch; a single CS3 is estimated to cost between $2-3M, though the exact price is not publicly disclosed by the company. But the performance, latency, and throughput amortize that cost over a massive number of users.

symbol


Read More

Cerebras throughput results are 12 times faster than the fastest GPU, and 6 times faster than competitor Samba Nova.


Cerebras throughput results are 12 times faster than the fastest GPU, and 6 times faster than ... [+]

Cerebras

To put it into perspective, Cerebras ran the 405B model nearly twice as fast as the fastest GPU cloud ran the 1B model. Twice the speed on a model that is two orders of magnitude more complex.

Ok, this is just insanely fast.


Ok, this is just insanely fast.

Cerebras

As one should expect, the Cerebras CS3 also delivered excellent latency (time to first token), at barely over 1/2 the time of the Google Vertex service, and one sixth the time required by competitors SambaNova and Amazon.

Cerebras also wins in response time, as measured in seconds to first token.


Cerebras also wins in response time, as measured in seconds to first token.

Cerebras

Cerebras is quick to note this is just the first step. They have increased the performance of Llama 70B from 400 tokens per second to 2,200 t/s in just a little over three months. And while Blackwell will increase inference performance by four fold over Hopper, it will not come even close to the performance of Cerebras.

And Cerebras is just getting started on models like Llama 405B


And Cerebras is just getting started on models like Llama 405B

Cerebras


But who needs this level of performance?​


Ok, so nobody can ready anywhere close to 1000/tokens per second, which translates into about 500 words. But computers certainly can and do. And inference is undergoing a transformation, from simple queries to becoming a component in agentic AI and multi-query AI to provide better results.

β€œBy running the largest models at instant speed, Cerebras enables real-time responses from the world’s leading open frontier model,” noted Mr. Feldman. β€œThis opens up powerful new use cases, including reasoning and multi-agent collaboration, across the AI landscape." Open AI’s o1 may demand as much as 10 times the computer of GPT-40 and agentic AI coupled with chain of thought requires over 100 times the performance available on today’s fastest GPUs.

Chain of thought, as seen with OpenAI's o1 service, and agentic AI


Chain of thought, as seen with OpenAI's o1 service, and agentic AI are two of the examples requiring ... [+]

Cerebras


Cerebras and Molecular Dynamics Simulation​


Since this week is SuperComputing β€˜24, Cerebras also announced an amazing scientific accomplishment. The CS3 was able to deliver 1.2 million simulation steps per second, a new world’s record. Thats 700 times faster than Frontier, the worlds fastest supercomputer. This means that scientists can now perform 2 years worth of GPU-based simulations in a single day on a simgle Cerebras System. And this benchmark is based on the older CS-2 WSE!

Cerebras CS-2 blows the Frontier SuperComputer out of the water for molecular dynamics simulation


Cerebras CS-2 blows the Frontier SuperComputer out of the water for molecular dynamics simulation

Cerebras


Conclusions​


Instead of scaling AI training to produce more accurate answers, chain-of-thought reasoning explore different avenues and provide better answers. This "think before answering" approach provides dramatically better performance on demanding tasks like math, science, and code generation, fundamentally boosting the intelligence of AI models without requiring additional training. By running over 70x faster than other solutions, Cerebras Inference allows AI models to "think" far longer and return more accurate results. As agentic AI becomes available and eventually widespread, the demands in ifnerencing hardware will increase by another 10-fold.

Nothing even comes close to Cerebras in these emerging advancements in AI.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153











1/7
@Doyin_CL1
Google just launched Learn About, an innovative AI tool designed to enhance learning!

/search?q=#LearnAbout /search?q=#AI /search?q=#EdTech /search?q=#Google /search?q=#LearningJourney



GcS_wKdWEAA_X6V.png


2/7
@Doyin_CL1
πŸ“š Unlike traditional chatbots like Gemini or ChatGPT, Learn About is powered by Google’s LearnLM model, promoting educational research to align with how people learn best.



3/7
@Doyin_CL1
πŸ–ΌοΈ One standout feature is its focus on visuals and interactive content, making information easier to understand and remember.



4/7
@Doyin_CL1
πŸ” In a direct comparison with Google Gemini on the prompt, β€œHow big is the universe?”, both tools provided the same answer: β€œabout 93 billion light-years in diameter.”



5/7
@Doyin_CL1
πŸ“Š However, their presentations differed significantly! Gemini featured a Wikipedia diagram along with a summary and source links, while Learn About used an image from Physics Forums and offered related educational content.



6/7
@Doyin_CL1
πŸ—£οΈ Learn About even includes β€œwhy it matters” sections and β€œBuild your vocab” features, offering context and definitions for terms!

✨ In summary, Learn About enriches learning with visuals, contextual info, and vocabulary aids, while Gemini leans towards straightforward facts.



7/7
@Doyin_CL1
πŸ€” It’s not just about factual answers; Learn About even addresses quirky questions! For example, when asked about the β€œbest glue for pizza,” it flagged this as a β€œcommon misconception.”

πŸ“š Who knew AI could explain concepts like a study buddy?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
πŸŽ“ I Tried Google’s New AI Tool for Learningβ€”Here’s How It Went! πŸŽ“

Google’s experimental AI tool Learn About is a game-changer for educational exploration! 🌟 Designed as a learning companion, it’s not just another chatbotβ€”it’s powered by the LearnLM model and built specifically for answering deep, research-based questions. πŸ“šπŸ€–

Here’s what makes it stand out:
β€’Engaging formats: interactive guides, quizzes, and curated videos/photos. πŸŽ₯πŸ“
β€’Research-based summaries and deeper context than Google Search or Gemini.
β€’Wide range of topicsβ€”think β€œWhat causes earthquakes?” to β€œDoes money buy happiness?” πŸŒπŸ’­

When I tried it, the tool provided an engaging mix of summaries and visuals, making complex topics easier to digest. But here’s the catchβ€”can it truly revolutionize learning, or is it just another AI novelty?

What’s your take? Is this the future of education, or are we just scratching the surface? Let’s talk below! πŸ‘‡βœ¨ /search?q=#AI /search?q=#Education /search?q=#EdTech




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@juandoming
How Google’s LearnLM generative AI models support teachers and learners How generative AI expands curiosity and understanding with LearnLM



Gcbc-1CW8AApvGi.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/2
@ohdearitsmandy
πŸ“š Google’s new "Learn About" AI goes beyond traditional chatbots like Gemini or ChatGPT, offering a more interactive, educational experience! Built on the LearnLM model, it focuses on guiding users through topics with textbook-style responses, visuals, and "why it matters" boxes. 🌌🧠

Whether it's explaining the size of the universe or debunking myths (yes, glue on pizza isn’t a thing!), this AI tool aims to make learning more engaging and in-depth. Could this be the future of AI in education?

Can't wait to try it! Unfortunately, it does not seem to be available in Germany yet...

/search?q=#AI /search?q=#EdTech /search?q=#GoogleAI /search?q=#LearnAbout /search?q=#Gemini



GcRofXIWsAAbJix.jpg


2/2
@ohdearitsmandy
Source: Google’s AI β€˜learning companion’ takes chatbot answers a step further




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153





1/11
@venturetwins
Spotify quietly launched text-to-playlist in the U.S. ✨

You give it a prompt - it can be quite lengthy! - and it generates a playlist for you.

It can also incorporate your listening data (e.g. "feature songs from my recent favorite artists")



https://video.twimg.com/ext_tw_video/1858562151516172290/pu/vid/avc1/720x1476/-Ri_bOLEgMqPX_ix.mp4

2/11
@Keshavatearth
wait! listening data is integrated? I'm confused as to how that'd get implemented.



3/11
@venturetwins
probably RAG



4/11
@Protonimus
That’s awesome! That would be great for super specific requests and moods! β€œ Give me triumphant Roman Empire music after their latest victory over the Carthaginians but with a science fiction feel..”. then it gets lazy and just repeats The Imperial March from Star Wars πŸ˜‚



5/11
@nigeleccles
It is still going to play you pink pony club but this time its going to gaslight you into thinking you asked for it



6/11
@PJaccetturo
Fascinating. Easy to see how this leads to "generate me a song about..."



7/11
@Zenomercer
*yawn*

I need calendar&to-dolist-to-personalized alarm to wake me up in the morning.



8/11
@pratjoey
This is great



9/11
@GurdevSambali
Interesting. Can’t wait to try this out



10/11
@steven_finch
I have been using this for a while and it feels like it is populating far more major label tracks. Struggling with indie tracks and playlists in general



11/11
@jroebuck
Nice!!!!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/11
@hedgedog5
Spotify now has a feature where you can create an AI Playlist by giving it any prompt you want. Here’s an example of me giving it a try πŸ‘‡πŸ»



https://video.twimg.com/amplify_video/1858926186757271552/vid/avc1/720x1556/Urj0VcKl-6AweG2r.mp4

2/11
@aweknowing
WWIII playlist?



3/11
@hedgedog5
This is what would be played in WWIII?



4/11
@KilladiVaira
This is new feature I am hearing first time I will try bro



5/11
@hedgedog5
Yes very good new feature



6/11
@boujeecheese
The future is now



7/11
@hedgedog5
Yes sir



8/11
@amanarora_0x
I will try, thanks for sharing this



9/11
@hedgedog5
Your welcome



10/11
@1Senabato
Spotify knows the pulse of public very well...



11/11
@Juicy112_
This is gonna be so useful ngl




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153


SeedEdit-APP-V1.0 - a Hugging Face Space by ByteDance

Hi, SeedEdit - Doubao Team













1/12
@EHuanglu
SeedEdit: ByteDance’s Open-Source Photoshop Killer

It is an AI-powered image editor that’s completely free (link in the comments).

With this tool, you can edit images effortlessly using simple text prompts.

Here’s what you can do:
- Change facial expressions and poses
- Edit text within images
- Seamlessly swap backgrounds
- Adjust lighting and restyle scenes

9 examples:



https://video.twimg.com/ext_tw_video/1857808231478996996/pu/vid/avc1/1280x720/O75IOPp9pvVP4PZG.mp4

2/12
@EHuanglu
SeedEdit Demo 1 - Change Facial Expression

Prompt:
[Close his eyes and smile]



https://video.twimg.com/ext_tw_video/1857811731676106752/pu/vid/avc1/700x900/MLMZJaIUzu1Skgme.mp4

3/12
@EHuanglu
SeedEdit Demo 2 - Change Text Within Image

Prompt:
[Change the words to β€œcheap price”]



https://video.twimg.com/ext_tw_video/1857813748427476992/pu/vid/avc1/720x720/Vz7WaxHXjn03lwk7.mp4

4/12
@EHuanglu
SeedEdit Demo 3 - Adjust Lighting

Prompt:
[Studio light from left side]



https://video.twimg.com/ext_tw_video/1857814750769606656/pu/vid/avc1/720x720/r0DVAsZPyRTKumlT.mp4

5/12
@EHuanglu
SeedEdit Demo 4 - Change Background

Prompt:
[Let it be flying over the ocean]



https://video.twimg.com/ext_tw_video/1857816739029061632/pu/vid/avc1/700x900/Nta845n2n-Wymkzw.mp4

6/12
@EHuanglu
SeedEdit Demo 5 - Change Background

Prompt:
[Make it a wizard]



https://video.twimg.com/ext_tw_video/1857818057328259072/pu/vid/avc1/720x720/7kSyNPSpQzFKg83_.mp4

7/12
@EHuanglu
SeedEdit Demo 6 - Remove Object

Prompt:
[Empty street in a quite night]



https://video.twimg.com/ext_tw_video/1857820165175652353/pu/vid/avc1/1280x720/k9KUnNkPvln8hBhc.mp4

8/12
@EHuanglu
SeedEdit Demo 7 - Adjust Style

Prompt:
[The house is above the sky, fantasy style]



https://video.twimg.com/ext_tw_video/1857822838079762434/pu/vid/avc1/1280x720/5GzAPk70MZa44Fkf.mp4

9/12
@EHuanglu
SeedEdit Demo 8 - Change Pose

Prompt:
[Let she look to her right]



https://video.twimg.com/ext_tw_video/1857822882833014784/pu/vid/avc1/700x900/YCRMHjSWyI90Xatc.mp4

10/12
@EHuanglu
SeedEdit Demo 9

Prompt:
[Replace the rabbit with a fawn]



https://video.twimg.com/ext_tw_video/1857823147007119360/pu/vid/avc1/700x900/YEEk4wn429o0clCX.mp4

11/12
@EHuanglu
Try it for free here:

SeedEdit-APP-V1.0 - a Hugging Face Space by ByteDance



12/12
@EHuanglu
If you find this useful, follow me @EHuanglu

I’ll keep testing out the latest AI tools and sharing everything I discover.



GchFh4KaMAEfdHx.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/1
@MoolahRevive
yo, huggingface just made it drop with seededit! πŸš€πŸ’₯ forget the complicated stuff, this open-source editor lets u edit images with just words! 🀯 what’s your first creation gonna be? /search?q=#AI /search?q=#OpenSource /search?q=#SeedEdit /search?q=#Innovation



https://video.twimg.com/ext_tw_video/1858892624226033664/pu/vid/avc1/1200x720/9UWIGJKZ35OVjBTQ.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153

About​


Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.



promptfoo.dev

promptfoo produces matrix views that let you quickly evaluate outputs across many prompts and inputs:

prompt evaluation matrix - web viewer

It works on the command line too:

Prompt evaluation

It also produces high-level vulnerability and risk reports:

gen ai red team

Why choose promptfoo?​

There are many different ways to evaluate prompts. Here are some reasons to consider promptfoo:

  • Developer friendly: promptfoo is fast, with quality-of-life features like live reloads and caching.
  • Battle-tested: Originally built for LLM apps serving over 10 million users in production. Our tooling is flexible and can be adapted to many setups.
  • Simple, declarative test cases: Define evals without writing code or working with heavy notebooks.
  • Language agnostic: Use Python, Javascript, or any other language.
  • Share & collaborate: Built-in share functionality & web viewer for working with teammates.
  • Open-source: LLM evals are a commodity and should be served by 100% open-source projects with no strings attached.
  • Private: This software runs completely locally. The evals run on your machine and talk directly with the LLM.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153

About​


A template engine for LLM prompts with support for writing prompts with prompts



metaprompt-lang.org/

metaprompt CI Status Documentation Join Discord

Metaprompt is a domain-specific language for LLM prompt engineering. It is a template engine for textual prompts, where expression expansion can depend on LLM outputs.

The goal is to extend the usual techniques of parametrized prompts with programmability, reusability and meta-prompting abilities.

Quick example​

The text you are reading right now is a valid metaprompt program.

[# this is a comment that is ignored by the interpreter, that can be
used to add some info for the human-developer]

[# This whole text is a parametrized prompt, one of the parameters
being [:subject]]

[# [:subject] here is a variable reference. Variables can be defined
in-place, or passed from the external environment]

Give me a detailed poetic description of [:subject], using one or more
of the following metaphoric expressions:

[# Now I want to specialize my prompt depending on the value of
[:subject]. The output of the prompt below will be included *instead*
of the [$ ... block]: ]

[$ Write me a bullet list of metaphors for [:subject]. Do not produce
any other output]

[# Conditionals allow for logic branching: ]

[:if [:subject] is a human
:then
Use jokingly exaggerated style
:else
Include some references to [$ List some people who have any
relation to [:subject], comma-separated]
]


See examples/ for more.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153



November 12, 2024

Building a Large Geospatial Model to Achieve Spatial IntelligenceEric Brachmann and Victor Adrian Prisacariu



At Niantic, we are pioneering the concept of a Large Geospatial Model that will use large-scale machine learning to understand a scene and connect it to millions of other scenes globally.

When you look at a familiar type of structure – whether it’s a church, a statue, or a town square – it’s fairly easy to imagine what it might look like from other angles, even if you haven’t seen it from all sides. As humans, we have β€œspatial understanding” that means we can fill in these details based on countless similar scenes we’ve encountered before. But for machines, this task is extraordinarily difficult. Even the most advanced AI models today struggle to visualize and infer missing parts of a scene, or to imagine a place from a new angle. This is about to change: Spatial intelligence is the next frontier of AI models.

As part of Niantic’s Visual Positioning System (VPS), we have trained more than 50 million neural networks, with more than 150 trillion parameters, enabling operation in over a million locations. In our vision for a Large Geospatial Model (LGM), each of these local networks would contribute to a global large model, implementing a shared understanding of geographic locations, and comprehending places yet to be fully scanned.

The LGM will enable computers not only to perceive and understand physical spaces, but also to interact with them in new ways, forming a critical component of AR glasses and fields beyond, including robotics, content creation and autonomous systems. As we move from phones to wearable technology linked to the real world, spatial intelligence will become the world’s future operating system.

What is a Large Geospatial Model?

Large Language Models (LLMs) are having an undeniable impact on our everyday lives and across multiple industries. Trained on internet-scale collections of text, LLMs can understand and generate written language in a way that challenges our understanding of β€œintelligence”.

Large Geospatial Models will help computers perceive, comprehend, and navigate the physical world in a way that will seem equally advanced. Analogous to LLMs, geospatial models are built using vast amounts of raw data: billions of images of the world, all anchored to precise locations on the globe, are distilled into a large model that enables a location-based understanding of space, structures, and physical interactions.

The shift from text-based models to those based on 3D data mirrors the broader trajectory of AI’s growth in recent years: from understanding and generating language, to interpreting and creating static and moving images (2D vision models), and, with current research efforts increasing, towards modeling the 3D appearance of objects (3D vision models).



Geospatial models are a step beyond even 3D vision models in that they capture 3D entities that are rooted in specific geographic locations and have a metric quality to them. Unlike typical 3D generative models, which produce unscaled assets, a Large Geospatial Model is bound to metric space, ensuring precise estimates in scale-metric units. These entities therefore represent next-generation maps, rather than arbitrary 3D assets. While a 3D vision model may be able to create and understand a 3D scene, a geospatial model understands how that scene relates to millions of other scenes, geographically, around the world. A geospatial model implements a form of geospatial intelligence, where the model learns from its previous observations and is able to transfer knowledge to new locations, even if those are observed only partially.

While AR glasses with 3D graphics are still several years away from the mass market, there are opportunities for geospatial models to be integrated with audio-only or 2D display glasses. These models could guide users through the world, answer questions, provide personalized recommendations, help with navigation, and enhance real-world interactions. Large language models could be integrated so understanding and space come together, giving people the opportunity to be more informed and engaged with their surroundings and neighborhoods. Geospatial intelligence, as emerging from a large geospatial model, could also enable generation, completion or manipulation of 3D representations of the world to help build the next generation of AR experiences. Beyond gaming, Large Geospatial Models will have widespread applications, ranging from spatial planning and design, logistics, audience engagement, and remote collaboration.

Our work so far

Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse.

With VPS, users can position themselves in the world with centimeter-level accuracy. That means they can see digital content placed against the physical environment precisely and realistically. This content is persistent in that it stays in a location after you’ve left, and it’s then shareable with others. For example, we recently started rolling out an experimental feature in PokΓ©mon GO, called PokΓ©mon Playgrounds, where the user can place PokΓ©mon at a specific location, and they will remain there for others to see and interact with.

Niantic’s VPS is built from user scans, taken from different perspectives and at various times of day, at many times during the years, and with positioning information attached, creating a highly detailed understanding of the world. This data is unique because it is taken from a pedestrian perspective and includes places inaccessible to cars.



Today we have 10 million scanned locations around the world, and over 1 million of those are activated and available for use with our VPS service. We receive about 1 million fresh scans each week, each containing hundreds of discrete images.

As part of the VPS, we build classical 3D vision maps using structure from motion techniques - but also a new type of neural map for each place. These neural models, based on our research papers ACE (2023) and ACE Zero (2024) do not represent locations using classical 3D data structures anymore, but encode them implicitly in the learnable parameters of a neural network. These networks can swiftly compress thousands of mapping images into a lean, neural representation. Given a new query image, they offer precise positioning for that location with centimeter-level accuracy.



Niantic has trained more than 50 million neural nets to date, where multiple networks can contribute to a single location. All these networks combined comprise over 150 trillion parameters optimized using machine learning.

From Local Systems to Shared Understanding

Our current neural map is a viable geospatial model, active and usable right now as part of Niantic’s VPS. It is also most certainly β€œlarge”. However, our vision of a β€œLarge Geospatial Model” goes beyond the current system of independent local maps.

An entirely local model might lack complete coverage of their respective locations. No matter how much data we have available on a global scale, locally, it will often be sparse. The main failure mode of a local model is its inability to extrapolate beyond what it has already seen and from where the model has seen it. Therefore, local models can only position camera views similar to the views they have been trained with already.

Imagine yourself standing behind a church. Let us assume the closest local model has seen only the front entrance of that church, and thus, it will not be able to tell you where you are. The model has never seen the back of that building. But on a global scale, we have seen a lot of churches, thousands of them, all captured by their respective local models at other places worldwide. No church is the same, but many share common characteristics. An LGM is a way to access that distributed knowledge.

An LGM distills common information in a global large-scale model that enables communication and data sharing across local models. An LGM would be able to internalize the concept of a church, and, furthermore, how these buildings are commonly structured. Even if, for a specific location, we have only mapped the entrance of a church, an LGM would be able to make an intelligent guess about what the back of the building looks like, based on thousands of churches it has seen before. Therefore, the LGM allows for unprecedented robustness in positioning, even from viewpoints and angles that the VPS has never seen.

The global model implements a centralized understanding of the world, entirely derived from geospatial and visual data. The LGM extrapolates locally by interpolating globally.

Human-Like Understanding

The process described above is similar to how humans perceive and imagine the world. As humans, we naturally recognize something we’ve seen before, even from a different angle. For example, it takes us relatively little effort to back-track our way through the winding streets of a European old town. We identify all the right junctions although we had only seen them once and from the opposing direction. This takes a level of understanding of the physical world, and cultural spaces, that is natural to us, but extremely difficult to achieve with classical machine vision technology. It requires knowledge of some basic laws of nature: the world is composed of objects which consist of solid matter and therefore have a front and a back. Appearance changes based on time of day and season. It also requires a considerable amount of cultural knowledge: the shape of many man-made objects follow specific rules of symmetry or other generic types of layouts – often dependent on the geographic region.

While early computer vision research tried to decipher some of these rules in order to hard-code them into hand-crafted systems, it is now consensus that such a high degree of understanding as we aspire to can realistically only be achieved via large-scale machine learning. This is what we aim for with our LGM. We have seen a first glimpse of impressive camera positioning capabilities emerging from our data in our recent research paper MicKey (2024). MicKey is a neural network able to position two camera views relative to each other, even under drastic viewpoint changes.



MicKey can handle even opposing shots that would take a human some effort to figure out. MicKey was trained on a tiny fraction of our data – data that we released to the academic community to encourage this type of research. MicKey is limited to two-view inputs and was trained on comparatively little data, but it still represents a proof of concept regarding the potential of an LGM. Evidently, to accomplish geospatial intelligence as outlined in this text, an immense influx of geospatial data is needed – a kind of data not many organizations have access to. Therefore, Niantic is in a unique position to lead the way in making a Large Geospatial Model a reality, supported by more than a million user-contributed scans of real-world places we receive per week.

Towards Complementary Foundation Models​

An LGM will be useful for more than mere positioning. In order to solve positioning well, the LGM has to encode rich geometrical, appearance and cultural information into scene-level features. These features will enable new ways of scene representation, manipulation and creation. Versatile large AI models like the LGM, which are useful for a multitude of downstream applications, are commonly referred to as β€œfoundation models”.

Different types of foundation models will complement each other. LLMs will interact with multimodal models, which will, in turn, communicate with LGMs. These systems, working together, will make sense of the world in ways that no single model can achieve on its own. This interconnection is the future of spatial computing – intelligent systems that perceive, understand, and act upon the physical world.

As we move toward more scalable models, Niantic’s goal remains to lead in the development of a large geospatial model that operates wherever we can deliver novel, fun, enriching experiences to our users. And, as noted, beyond gaming Large Geospatial Models will have widespread applications, including spatial planning and design, logistics, audience engagement, and remote collaboration.

The path from LLMs to LGMs is another step in AI’s evolution. As wearable devices like AR glasses become more prevalent, the world’s future operating system will depend on the blending of physical and digital realities to create a system for spatial computing that will put people at the center.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153




















1/38
@franklyn_wang
Doubling o1-preview performance on ARC-AGI with one simple trick πŸš€

tldr: by providing human-like representations to o1, we are able to substantially increase performance on @arcprize.



Gcq275tWsAAYEOx.jpg

Gcq3BcQXsAAsZWE.jpg


2/38
@franklyn_wang
AI is really smart; its scores on math contests are no joke. But ARC Prize should be much easier than math contests, and yet frontier models generally do not score very well.



3/38
@franklyn_wang
This is mainly because frontier models cannot really β€œsee” the grid; imagine a genius trying to do ARC Prize with their eyes closed -- seems unlikely to bear fruit!



4/38
@franklyn_wang
ARC Prize problems are not strictly well-posed -- there are many patterns that take every input to output, but only one mapping is evident to people.



5/38
@franklyn_wang
You might ask, how do we get the representations? Do we use tree diffusion or autoencoding? The answer is no! We use an extensive set of handwritten heuristics to capture patterns that are salient to people.



6/38
@franklyn_wang
For example, here's a simple case -- the algorithm recognizes the shape being re-colored.



Gcq4VZfWEAIa_5P.jpg

Gcq4b3QX0AEFfYN.png


7/38
@franklyn_wang
We also show a more complicated example -- note our ability to handle augmentations!



Gcq4nRPW8AAk9rX.jpg

Gcq4vctWcAEElEg.png


8/38
@franklyn_wang
We can even handle occlusions!



Gcq46xsWcAAUnsG.jpg

Gcq5BFUWIAAOl5o.png


9/38
@franklyn_wang
Because our method essentially represents each grid as an abstract syntax tree, we call our method Pattern Extraction and Abstraction for Cognitive Heuristics (PEACH) -- as peaches grow on trees, unlike other "reasoning" fruits.



10/38
@franklyn_wang
Then, we simply ask frontier LLMs like o1 to code the mapping -- and find that a small handful of samples is enough for strong results, far fewer than the thousands used in prior art!



Gcq6cnGXQAEaYLW.jpg


11/38
@franklyn_wang
We find these results encouraging for solving this problem in a β€œhuman-like way”. We also emphasize that we do not fine tune any models, so there's plenty more juice!



12/38
@franklyn_wang
All large efforts require a team. I’d like to thank @gopalkgoel1, @kattian_ , @minimario1729, Justin Zhang, @yunyu_l, @rahulgs, @cool_cocohearts, @fluorane, and @jacobtpl for all their helpful contributions, both conceptual and practical.



13/38
@franklyn_wang
I'd also like to acknowledge Yunyu Lin (@yunyu_l) and David Petersen (@typesfaster) for financial support used to conduct this research.



14/38
@franklyn_wang
But this is just the beginning! If interested, please DM me here to discuss potential collaborations.



15/38
@unsorsodicorda
What's BARC here? A GOFAI approach?



16/38
@franklyn_wang
Hi, BARC refers to this excellent work (GitHub - xu3kev/BARC: Bootstrapping ARC) by @xu3kev @HuLillian39250 @ZennaTavares @evanthebouncy @BasisOrg @ellisk_kellis



17/38
@BenFerrum
Sir this is awesome, what happens when there is an unseen pattern which had not been hard coded? Do you see a way that the AI creates the representative for itself?



18/38
@franklyn_wang
The LLMs are often able to figure out some pretty interesting patterns that would be hard to figure out otherwise. :-)



19/38
@wpenman
Are you saying that the handcrafted heuristics you’ve developed are applied to the unseen instances too to encode them for o1? And then o1 does few-sample code synthesis, which you verify automatically to identify the best candidate?



20/38
@franklyn_wang
Yes, the handcrafted heuristics are applied to lots of unseen instances as well. We then do few-sample code synthesis.



21/38
@vimota
How does your representation logic β€œknow” that it’s been re-colored and is the same object (all the positions are the same). Curious where you think the line is between encoding rules in that logic vs just representation



22/38
@franklyn_wang
I agree that logic and representation have blurred lines. I like to think of a representation is good if it's enough to prompt a question to solve the problem.



23/38
@CRO_conaut
impressive fren



24/38
@alexkaplan0
Wow this is awesome



25/38
@mandeepabagga
All we need is a better image encoder for LLMs



26/38
@JasonRute
Interesting approach! I have the same question for you as the other group with a similar ARC-AGI score:

[Quoted tweet]
Why don’t you submit this to ARC-AGI-Pub (arcprize.org/arc-agi-pub)? That way there is no doubt you are doing better than the best public eval models (and semi-public eval)? (Or is it that your model takes longer than the 12 hours required.)


27/38
@threadreaderapp
Your thread is creating a buzz! /search?q=#TopUnroll Thread by @franklyn_wang on Thread Reader App πŸ™πŸΌ@ain3sh for πŸ₯‡unroll



28/38
@ProbabilisticGM
I once heard a saying "choosing the representation is half the battle". But amazing discovery!



29/38
@jeevliberated
I feel like part of the allure of arc-agi is to figure out if the model is capable of doing this without being explicitly prompted



30/38
@AiAnvil
Reinfornces my prior that models are capable of much more but our prompting strategies are so rudimentary currently. Intriguing that humanizing was the way to go, though it makes sense in hindsight. Also seems model agnostic. Excited to dig into it more tomorrow



31/38
@sheriffff0
Top. Congrats



32/38
@ultimate0164338
Interesting that people seem to understand the ridiculousness of the challenge, its clear that many models out there match way more complex patterns. Still its talked about as if its a serious benchmark. What is not realized is that this is only because of the amount of money



33/38
@permaximum88
Did you guys try 3.5 Sonnet old and/or new as well?



34/38
@ain3sh
@threadreaderapp unroll



35/38
@deltanym
this is interesting but it kind of feels like giving away the answer?



36/38
@florian_jue
But who does the PEACH abstraction into the syntax tree? Do you have to do that manually or does o1 do that? It feels like you need to understand the solution to construct the correct syntax tree, eg which shapes should belong together etc.



37/38
@matt_seb_ho
Congrats on the great results! Sorry if this is obvious, I just want to clarify that your AST representation of grids is created from a deterministic program (and not model generated)? Also, are there plans for paper or code release?



38/38
@stalkermustang
Great! Public LB submit wen?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



Summary of Franklyn Wang's Twitter Thread on Improving ARC-AGI Performance​

Introduction​

Franklyn Wang, a researcher in the field of Artificial Intelligence (AI), shared a detailed Twitter thread about how he and his team significantly improved the performance of an AI model called o1 on a challenge known as ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence). Here’s a simplified breakdown of what they did and why it matters.

The Challenge: ARC-AGI​

ARC-AGI is a series of problems designed to test an AI's ability to reason and understand patterns in grids. Despite being highly advanced, current AI models struggle with these problems because they can't "see" the patterns in the same way humans do.

The Problem with Current Models​

Current AI models are very good at math and other complex tasks but perform poorly on ARC-AGI. This is because these models can't interpret visual patterns as easily as humans can. Imagine trying to solve a puzzle with your eyes closed; it's similarly challenging for these AIs.

The Solution: Human-Like Representations​

Wang and his team developed a method called PEACH (Pattern Extraction and Abstraction for Cognitive Heuristics) to help the AI model understand these patterns better. Here’s how it works:
  • Handcrafted Heuristics: Instead of using complex algorithms like tree diffusion or autoencoding, they used simple, handwritten rules to identify patterns that are obvious to humans.
  • Abstract Syntax Trees: They represented each grid as an abstract syntax tree, which is a way of structuring data that makes it easier for the AI to understand.

Examples and Results​

The team provided several examples showing how their method works:
  • Re-coloring Shapes: The algorithm can recognize when shapes are re-colored but remain the same.
  • Handling Augmentations: It can handle changes in the grid.
  • Handling Occlusions: It can even deal with parts of the grid being hidden.
By using these representations, they found that the AI model o1 could solve ARC-AGI problems much more effectively with just a few examples, rather than needing thousands.

Key Points​

  • No Fine-Tuning: They didn't fine-tune any models; this means there's still room for improvement.
  • Team Effort: The project involved contributions from multiple researchers.
  • Financial Support: They acknowledged financial support from Yunyu Lin and David Petersen.

Community Reaction​

The thread generated a lot of interest and discussion:
  • Some users praised the approach and its potential.
  • Others asked about handling unseen patterns and whether the AI could create its own representations.
  • There were questions about submitting their results to official benchmarks and releasing their code.

Contextualizing Everything​

Why It Matters​

Improving performance on ARC-AGI is significant because it shows that AIs can be made to reason more like humans if given the right tools. This has implications for various fields where pattern recognition and reasoning are crucial.

Future Directions​

Wang mentioned that this is just the beginning and invited potential collaborators to discuss further work. There are plans for possibly releasing a paper or code related to this research.

Broader Implications​

This work reinforces the idea that current prompting strategies for AIs are rudimentary and that humanizing these strategies can lead to better results. It also suggests that models are capable of much more than what they currently achieve due to limitations in how we interact with them.In summary, Franklyn Wang's team developed a simple yet effective method (PEACH) to help an AI model better understand visual patterns in grids, significantly improving its performance on ARC-AGI challenges without needing extensive training data or fine-tuning. This approach has sparked interest and discussion within the AI community about how to make AIs reason more like humans.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153











1/12
@sunomusic
v4 is here! πŸ”₯ We’re super excited to introduce v4, enabling you to make any song you can imagine, with better audio, sharper lyrics, and more dynamic song structures. Try it now on Suno!

New features powered by v4:

✨ Remaster: Upgrade your tracks in v4 quality

πŸ“ Lyrics: Creative, higher quality lyrics for your songwriting

🎨 Cover Art: Fresh designs to compliment your music’s vibe



https://video.twimg.com/ext_tw_video/1858913100948295680/pu/vid/avc1/720x1280/pXa5Pt0AjeZBc7o3.mp4

2/12
@sunomusic
Also now supercharged in v4:

🎢 Covers: Reimagine your track in new styles

🎭 Personas: Unlock your musical alter-ego and maintain a consistent sound

v4 is now available in beta to Pro and Premier 🎡: Tag us in your songs @sunomusic - we can’t wait to hear what you create!



3/12
@jackdbrody
V5 when?



4/12
@sunomusic
😭🀣



5/12
@DanteLarrauri
Loviu so much



Gcw3jdQWIAA4Itd.png


6/12
@sunomusic




Gcw8emTW8AAvOHj.jpg


7/12
@PushTheFrontier
Just tried it. Huge improvement!!



8/12
@sunomusic
thank you!!



9/12
@nanosapien1
congrats on the release!



10/12
@sunomusic
thank you πŸ™



11/12
@geeknik
Hey, don't y'all know I have work to do!! πŸ˜†



12/12
@sunomusic
please suno responsibly




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/1
@MoritaHiroyo
πŸ‘»New Song: "The Diligent Ghost"πŸ‘»
I used v4. The music is great, but the sound quality is amazingly beautiful! Thank you, Suno!
1: Original (Composition & Performance by Me)
2: Created with /search?q=#AI " @sunomusic "
/search?q=#ClassicalComposePerformAI /search?q=#GenerativeAI /search?q=#Suno
@m3we_world



https://video.twimg.com/ext_tw_video/1859247792499367936/pu/vid/avc1/412x732/5D9rpyW5fR7NYb-m.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,435
Reputation
8,509
Daps
160,153



1/67
@deepseek_ai
πŸš€ DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

πŸ” o1-preview-level performance on AIME & MATH benchmarks.
πŸ’‘ Transparent thought process in real-time.
πŸ› οΈ Open-source models & API coming soon!

🌐 Try it now at DeepSeek
/search?q=#DeepSeek



Gc0zgl8bkAAMTtC.jpg


2/67
@deepseek_ai
🌟 Impressive Results of DeepSeek-R1-Lite-Preview Across Benchmarks!



Gc0zl7WboAAnCTS.jpg


3/67
@deepseek_ai
🌟 Inference Scaling Laws of DeepSeek-R1-Lite-Preview
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.



Gc0zpWDbAAA6T-I.jpg


4/67
@paul_cal
Tic tac toe will be cracked one day. Not today



Gc089xTXMAACjDa.jpg

Gc08-ujXQAAgw-h.jpg

Gc09G3bX0AAInTf.jpg


5/67
@paul_cal
Very impressive! Esp transparent CoT and imminent open source release

I get it's hard to compare w unreleased o1's test time scaling without an X axis, but worth noting o1 full supposedly pushes higher on AIME (~75%)

What's with the inconsistent blue lines though?



Gc04oxYW4AAG4QQ.jpg

Gc04vKAXQAAFSTd.png


6/67
@Yuchenj_UW
Congrats! looking forward to the open-source models and hosting them on Hyperbolic!



7/67
@itaybachman
thank you for forcing OpenAI to release o1 full πŸ₯°



8/67
@koltregaskes
Amazing work. Bettering o1-preview is a huge achievement!



9/67
@GorkaMolero
How is this free?



10/67
@_Holistech_
Deep Think is awesome, congratulations! Reading the inner thoughts removes the impression of a black box like OpenAI o1 and is fun to read. I am very impressed.



11/67
@shytttted




Gc1bHeHXQAAej19.jpg


12/67
@MarkusOdenthal
Nice I looking forward to the API.



13/67
@MaximeRivest
The problem with all those cool new models when you're a builder is that every new model coming out needs a thorough review and evaluation.



14/67
@XENOWHITEx
Yeah boi we accelerating. Lfg πŸ—£οΈπŸ—£οΈπŸ—£οΈ



15/67
@Presidentlin
Impressive very nice, just over 2 months. Still would like VL2




16/67
@eoft_ai
Sheeeeesh, great work! The gap closes day by day



17/67
@AtaeiMe
Open source soon that later pls! Is the white paper coming as well?



18/67
@stochasticchasm
Very great work!



19/67
@JasonBotterill3
the chinese are here



Gc1YgU0akAAq2pt.jpg


20/67
@sauravpanda24
Open source o1 alternative would be a great model for fine tuning LLMs for other tasks. Super excited to try them out!



21/67
@DustinBeachy
Always great to have another brain to work with. Great job!!



22/67
@metsikpalk
It got this right with deep think mode enabled. But it took 65 seconds and wrote me an entire book. But okay fair, it’s trying super hard to find the answer. Perhaps increase the models own answering confidence? πŸ˜€



Gc1UjnIWoAAHVh3.jpg

Gc1UjnCWgAE5zpb.jpg


23/67
@victor_explore
when everyone's releasing "lite" models, you know ai's gone full tech fashion season



24/67
@Emily_Escapor
I hope these numbers are accurate, not just for hype.



25/67
@mrsiipa
is this due to the nature of training data or something else? this model is not able to answer these problems correctly

[Quoted tweet]
i tried this bash coding example from @OpenAI 's o1 blogpost with @deepseek_ai 's model, it thinks for 12 seconds but the final code does not work.

The output should be:
[1,3,5],[2,4,6]


Gc1Eu0DbUAAypQ2.jpg

Gc1FisGbsAA0V-s.jpg


26/67
@SystemSculpt
The whale surfaces again for a spectacular show.



27/67
@johnkhfung
Been quiet for a while and cooking something great!
It is really good. I am waiting the API access



28/67
@NotBrain4brain
They did this to undercut OpenAI full o1 releases today



29/67
@IncKingdomX
This is impressive!



30/67
@bbbbbwu
man it's so cool



31/67
@mintisan
very nice work,bro...



32/67
@NaanLeCun
Chinese AI labs>US AI labs



33/67
@etemiz
Mr. altman, tear, down, this wall!



34/67
@BobTB12
It fails this test. O1-preview knows strawberry is still on the bed, as it falls out.



Gc1OCz-WcAANqs7.png


35/67
@WilliamLamkin




36/67
@leo_agi
will you release a tech report?



37/67
@RobIsTheName2
It is "Lite" but competes with o1-preview? Curious to see non "Lite" r1



38/67
@vlad3ciobanu
I'm blown away. The visible chain of thought is a major breakthrough for open AI research. Congratulations!



39/67
@AnAcctOfAllTime
LLMs when they need to make a countdown from 100 to 1 and replace all multiples of 7 with a specific word:



Gc1mASHXgAACrHF.png


40/67
@AI_GPT42
Can I get API access to DeepSeek-R1-Lite-Preview?



41/67
@allformodel
this is insane. when API?



42/67
@HrishbhDalal
will you open source it as well? would be amazing!!



43/67
@jimmycarter173
Looking forward to the paper! Yours are always a great read!

Will this be a closed-source model?



44/67
@judd2324
Awesome! Hope to see a open weight release.



45/67
@DavidAIinchina
It was impressive to see the results.



Gc1lsXMbQAAaE6Q.jpg


46/67
@99_frederick
A wave of distilled long-form reasoning data applications is coming to specific domains soon. /search?q=#NLP /search?q=#LLM



47/67
@lehai0609
You are GOAT. Take my money!!!



48/67
@zeroth_e
I tried it and it still seems to be worse than o1-preview at coding in certain tasks, but I think its math capabilities are better. It's around even, and I really want OpenAI to release o1-full now



49/67
@Cesarfalante
Thats incredible!
50 messages a day is honestly more than enough for the average person if you only use the reasoning model for reasoning tasks (and traditional LLMs for other stuff).
Great job, cheers from Brazil!



50/67
@Ashton_Waystone
newbee



51/67
@khaldounbouzai1
most important thing the dataset code and the paper transparency for all open source community to cooperate better



52/67
@btc4me2
thank you for showing the full reasoning traces! @OpenAI your move



53/67
@jermiah_64
Absolutely wild! Props to you guys



54/67
@Cyclops02448225
Every time I am about to take gpt/claude subscription Deepseek or Qwen drops a banger.



55/67
@PatrikarSanket
If a mobile app isn't in the plans, could you consider a pwa? Would love to have that on the phone to use.



56/67
@RThermo56
There is no moat.

Nvidia is the only winner.



57/67
@_CorvenDallas_
I just tried it and I'm impressed 😁



58/67
@abtb168
congrats on the release! πŸ₯³



59/67
@marvijo99
Link to the paper please



60/67
@zaeppang316902
πŸ”₯πŸ”₯πŸ”₯πŸ”₯ @OpenAI β˜„οΈ



61/67
@JimMcM4
Thank you! I'll test this out. One tip though, I was able to bypass agreeing to your ToS and Privacy Policy when signing up through Google Auth. DM me if you need details.



62/67
@revmarsonal
Great work!



63/67
@bowtiedjconnor
@CerebrasSystems, run this model and the world is yours.



64/67
@0xEbadi
Deep think enable vs disable



Gc1mwfDXwAAn_IY.jpg


65/67
@vasej79628
wow you are fast



https://video.twimg.com/ext_tw_video/1859240500533833730/pu/vid/avc1/720x720/LARBE1WPCfev9ctA.mp4

66/67
@noxlonga
smart enough to realize that -5 can be the answer, but hallucinated about positive integer as requirement



Gc1BYrTWgAAX8fc.png


67/67
@modelsarereal
you should avoid anti-consciousness training

[Quoted tweet]
here is the answer of the new chinese model "DeepThink" of deepseek

Seems to be a trained anti-consciousness answer to avoid the AI to appear having any kind of conscious behavior.


Gc1CvzIXYAASn6Z.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top