bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556


1/3
@rohanpaul_ai
Reverse Engineering o1 OpenAI Architecture with Claude 👀



2/3
@NorbertEnders
The reverse engineered o1 OpenAI Architecture simplified and explained in a more narrative style, using layman’s terms.
I used Claude Sonnet 3.5 for that.

Keep in mind: it’s just an educated guess



3/3
@NorbertEnders
Longer version:

Imagine a brilliant but inexperienced chef named Alex. Alex's goal is to become a master chef who can create amazing dishes on the spot, adapting to any ingredient or cuisine challenge. This is like our language model aiming to provide intelligent, reasoned responses to any query.

Alex's journey begins with intense preparation:

First, Alex gathers recipes. Some are from famous cookbooks, others from family traditions, and many are creative variations Alex invents. This is like our model's Data Generation phase, collecting a mix of real and synthetic data to learn from.

Next comes Alex's training. It's not just about memorizing recipes, but understanding the principles of cooking. Alex practices in a special kitchen (our Training Phase) where:

1. Basic cooking techniques are mastered (Language Model training).
2. Alex plays cooking games, getting points for tasty dishes and helpful feedback when things go wrong (Reinforcement Learning).
3. Sometimes, the kitchen throws curveballs - like changing ingredients mid-recipe or having multiple chefs compete (Advanced RL techniques).

This training isn't a one-time thing. Alex keeps learning, always aiming to improve.

Now, here's where the real magic happens - when Alex faces actual cooking challenges (our Inference Phase):

1. A customer orders a dish. Alex quickly thinks of a recipe (Initial CoT Generation).
2. While cooking, Alex tastes the dish and adjusts seasonings (CoT Refinement).
3. For simple dishes, Alex works quickly. For complex ones, more time is taken to perfect it (Test-time Compute).
4. Alex always keeps an eye on the clock, balancing perfection with serving time (Efficiency Monitoring).
5. Finally, the dish is served (Final Response).
6. Alex remembers this experience for future reference (CoT Storage).

The key here is Alex's ability to reason and adapt on the spot. It's not about rigidly following recipes, but understanding cooking principles deeply enough to create new dishes or solve unexpected problems.

What makes Alex special is the constant improvement. After each shift, Alex reviews the day's challenges, learning from successes and mistakes (feedback loop). Over time, Alex becomes more efficient, creative, and adaptable.

In our language model, this inference process is where the real value lies. It's the ability to take a query (like a cooking order), reason through it (like Alex combining cooking knowledge to create a dish), and produce a thoughtful, tailored response (serving the perfect dish).

The rest of the system - the data collection, the intense training - are all in service of this moment of creation. They're crucial, but they're the behind-the-scenes work. The real magic, the part that amazes the 'customers' (users), happens in this inference stage.

Just as a master chef can delight diners with unique, perfectly crafted dishes for any request, our advanced language model aims to provide insightful, reasoned responses to any query, always learning and improving with each interaction.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GXetQ5QXYAA9o0_.jpg

GXgWFRDXAAAj60F.jpg

GXgWFRAXQAAVFNi.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556











1/28
@CodeByPoonam
Google just dropped a bombshell

NotebookLM can now turn your notes into a Podcast in minutes.

I'll show you how in just 3 easy steps:



2/28
@CodeByPoonam
Google introduces a new Audio Overview feature that can turn documents, slides, charts, and more into engaging discussions with one click.

To try it out, follow these steps:

1/ Go to NotebookLM: Sign in - Google Accounts
- Create a new notebook.



3/28
@CodeByPoonam
2/ Add at least one source.
3/ In your Notebook guide, click on the “Generate” button to create an Audio Overview.



4/28
@CodeByPoonam
I uploaded my newsletter edition: AI Toast.

With one click, two AI hosts start up a lively “deep dive” discussion based on your sources.

Listen here 🔊



5/28
@CodeByPoonam
Read more here:
OpenAI released next big thing in AI



6/28
@CodeByPoonam
Thanks for reading.

Get latest AI updates and Tutorials in your inbox for FREE.

Join my AI Toast Community of 22000 readers:
AI Toast



7/28
@CodeByPoonam
Don't forget to bookmark for later.

If you enjoyed reading this post, please support it with like/repost of the post below 👇

[Quoted tweet]
Google just dropped a bombshell

NotebookLM can now turn your notes into a Podcast in minutes.

I'll show you how in just 3 easy steps:


8/28
@hasantoxr
Perfect guide 🙌🙌



9/28
@CodeByPoonam
Thanks for checking



10/28
@iamfakhrealam
It's surprising



11/28
@codedailyML
Amazing Share



12/28
@codeMdSanto
That's a game-changer! Technology never fails to amaze. Can't wait to see how it works!



13/28
@shawnchauhan1
That's awesome! Turning notes into a podcast that fast seems like a total productivity hack.



14/28
@AndrewBolis
Creating podcasts is easier than ever



15/28
@EyeingAI
Impressive guide, thanks for sharing.



16/28
@Klotzkette
It’s OK, but you can’t really give it any direction, so it’s useless



17/28
@vidhiparmxr
Helpful guide, Poonam!



18/28
@arnill_dev
That's like magic! Can't wait to see how it works. Exciting stuff!



19/28
@alifcoder
That's amazing! Turning notes into a podcast sounds so convenient.

Can't wait to see how it works.



20/28
@leo_grundstrom
Really cool stuff, thanks for sharing Poonam!



21/28
@LearnWithBishal
Wow this looks amazing



22/28
@shushant_l
This has made podcast creation super easy



23/28
@Parul_Gautam7
Excellent breakdown

Thanks for sharing Poonam



24/28
@jxffb
Just did one! So awesome!



25/28
@iam_kgkunal
That's amazing...Turning notes into a podcast so quickly sounds like a game-changer for productivity



26/28
@chriskclark
Here’s how we implemented this AI app in real life (yesterday).

[Quoted tweet]
was playing with NotebookLLM today as well. Here’s how I implemented the audio podcast mode (what I’m calling it) on an article today. You can listen to the AI generated conversation here —> agingtoday.com/health/fall-p…


27/28
@DreamWithO
I'd love to see this in action, how's the audio quality compared to traditional podcasting software?



28/28
@ThePushkaraj
The AI space is getting crazier day by day!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GXgfJhCboAATsJ0.jpg

GXgfP1XagAA7K40.jpg

GXgfJhCboAATsJ0.jpg

GXcAZM5bwAIMSsH.jpg















1/13
@minchoi
Google dropped NotebookLM recently.

AI tool that can generate podcasts of two speakers talking about the contents from various sources like research papers, articles, and more.

Absolutely bonkers.

100% AI 🤯

10 examples (and how to try):

1. AI Podcast about OpenAI o1 drop



2/13
@minchoi
2. AI Podcast from Newsletter

[Quoted tweet]
Very impressed with this new NotebookLM feature by Google Labs that turns notes/docs into podcasts

I uploaded this morning's newsletter, and it turned into a two-way podcast between two AI agent hosts

Give it a listen, pretty darn good (sound on 🔈)


3/13
@minchoi
3. AI Podcast from 90 min lecture

[Quoted tweet]
Googles NotebookLM's new podcast feature is wild

This is made from a 90min lecture I held on Monday

It condensed it into a 16 minute talkshow

Some hallucinations here and there, but overall this is a new paradigm for learning.

Link to try it below, no waitlist


4/13
@minchoi
4. AI Podcast from book "The Infernal Machine"

[Quoted tweet]
Rolling out audio overviews at NotebookLM today. So excited for this one.

Take any collection of sources and automatically generate a "deep dive" audio conversation.

I created one based on the text of my book The Infernal Machine. Have a listen. 🧵below

notebooklm.google.com


5/13
@minchoi
5. AI Podcast from Research Paper

[Quoted tweet]
So, Google just dropped #NotebookLM, an AI that creates podcast segments on research papers nearly instantly.

Here's the thing though, it doesn't check to see if anything you feed it is true, sooooo I plugged in my found footage creepypasta.

The results are amazing.😄

@labsdotgoogle


6/13
@minchoi
6. AI Podcast from Overview of NotebookLM

[Quoted tweet]
Just had my 3rd wow moment in AI... this time through AI Overview by NotebookLM 🤯


7/13
@minchoi
7. AI Podcast from paper "On the Category of Religion"

[Quoted tweet]
🤯 My mind is genuinely blown by Google's NotebookLM new Audio Overview feature. It creates a podcast for a document.

Here's a podcast for our paper "On the Category of Religion" that @willismonroe created.

I genuinely would not have known it was AI...


8/13
@minchoi
8. AI Podcast from System Card for OpenAI o1

[Quoted tweet]
Do you want to see something impressive?
This podcast isn’t real.
It’s AI-generated after I gave Google’s NotebookLM the system card for OpenAI’s new o1 model, and it produced a 10-minute podcast discussion that feels incredibly real, better, more informative, and more entertaining than most actual tech podcasts.


9/13
@minchoi
9. AI Podcast from News reports on "Black Myth: Wukong"

[Quoted tweet]
用 NotebookLM 快速生成「黑神話:悟空」英文新聞報導

如同之前大家所知道的, NotebookLM 是一個 Google 推出的 AI 筆記服務,他可以免費整合各種文件檔、連結以及純文字,幫你生成出摘要、目錄、問答等內容。

今天他推出音訊總覽,也就是他會藉由筆記的內容產出對話性節目,時間長度視你的內容多寡,產出時間大概是 10 分鐘以內,目前只提供英文。

我拿現成有的黑神話悟空來做以下的內容:


10/13
@minchoi
10. AI Podcast from College thesis

[Quoted tweet]
This AI service is so impressive! Google's NotebookLM is now capable of generating an audio overview based on documents uploaded and links to online resources.

I uploaded my bachelors thesis, my resume, and a link to my online course website and it created this really cool podcast like format.

It didn't get everything right but its so funny because NotebookLM actually drew great conclusions that I didn’t think about while writing this thesis myself.

Which AI tool could create a video for this audio file?

@labsdotgoogle #RenewableEnergy #offgridpower #batterystorage #SolarEnergy #AI


11/13
@minchoi
Try it out yourself, head over to 👇
Sign in - Google Accounts



12/13
@minchoi
If you enjoyed this thread,

Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

[Quoted tweet]
Google dropped NotebookLM recently.

AI tool that can generate podcasts of two speakers talking about the contents from various sources like research papers, articles, and more.

Absolutely bonkers.

100% AI 🤯

10 examples (and how to try):

1. AI Podcast about OpenAI o1 drop


13/13
@minchoi
If you want to keep up with the latest AI developments and tools, subscribe to The Rundown it's FREE.

And you'll never miss a thing in AI again:
The Rundown AI




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GXYVX3BWgAAy4rG.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556
DeepMind understands Strawberry - there is no moat


[Submitted on 6 Aug 2024]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters​


Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one should tradeoff inference-time and pre-training compute. Despite its importance, little research attempted to understand the scaling behaviors of various test-time inference methods. Moreover, current work largely provides negative results for a number of these strategies. In this work, we analyze two primary mechanisms to scale test-time computation: (1) searching against dense, process-based verifier reward models; and (2) updating the model's distribution over a response adaptively, given the prompt at test time. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt. This observation motivates applying a "compute-optimal" scaling strategy, which acts to most effectively allocate test-time compute adaptively per prompt. Using this compute-optimal strategy, we can improve the efficiency of test-time compute scaling by more than 4x compared to a best-of-N baseline. Additionally, in a FLOPs-matched evaluation, we find that on problems where a smaller base model attains somewhat non-trivial success rates, test-time compute can be used to outperform a 14x larger model.


Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:arXiv:2408.03314 [cs.LG]
(or arXiv:2408.03314v1 [cs.LG] for this version)
[2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Submission history​

From: Charlie Snell [view email]
[v1] Tue, 6 Aug 2024 17:35:05 UTC (4,152 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556






1/11
@denny_zhou
What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

[2402.12875] Chain of Thought Empowers Transformers to Solve Inherently Serial Problems (ICLR 2024)



2/11
@denny_zhou
Just noticed a fun youtube video for explaining this paper. LoL. Pointed by @laion_ai http://invidious.poast.org/4JNe-cOTgkY



3/11
@ctjlewis
hey Denny, curious if you have any thoughts. i reached the same conclusion:

[Quoted tweet]
x.com/i/article/178554774683…


4/11
@denny_zhou
Impressive! You would be interested at seeing this: [2301.04589] Memory Augmented Large Language Models are Computationally Universal



5/11
@nearcyan
what should one conclude from such a proof if it’s not also accompanied by a proof that we can train a transformer into the state (of solving a given arbitrary problem), possibly even with gradient descent and common post training techniques?



6/11
@QuintusActual
“We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.”

I’m guessing this is only true because as a problem grows in difficulty, the # of required tokens approaches ♾️



7/11
@Shawnryan96
How do they solve novel problems without a way to update the world model?



8/11
@Justin_Halford_
Makes sense for verifiable domains (e.g. math and coding).

Does this generalize to more ambiguous domains with competing values/incentives without relying on human feedback?



9/11
@ohadasor
Don't fall into it!!

[Quoted tweet]
"can solve any problem"? Really?? Let's read the abstract in the image attached to the post, and see if the quote is correct. Ah wow! Somehow he forgot to quote the rest of the sentence! How is that possible?
The full quote is "can solve any problem solvable by boolean circuits of size T". This changes a lot. All problems solvable by Boolean circuits, of any size, is called the Circuit Evaluation Problem, and is known to cover precisely polynomial time (P) calculations. So it cannot solve the most basic logical problems which are at least exponential. Now here we don't even have P, we have only circuits of size T, which validates my old mantra: it can solve only constant-time problems. The lowest possible complexity class.
And it also validates my claim about the bubble of machine learning promoted by people who have no idea what they're talking about.


10/11
@CompSciFutures
Thx, refreshingly straightforward notation too, I might take the time to read this one properly.

I'm just catching up and have a dumb Q... that is an interestingly narrow subset of symbolic operands. Have you considered what happens if you add more?



11/11
@BatAndrew314
Noob question- how is it related to universal approximation theorem? Meaning does transformer can solve any problem because it is neural net? Or it’s some different property of transformers and CoT?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GXnuMiObgAAizHF.png


[Submitted on 20 Feb 2024 (v1), last revised 23 May 2024 (this version, v3)]


Chain of Thought Empowers Transformers to Solve Inherently Serial Problems​


Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length n, previous works have shown that constant-depth transformers with finite precision poly(n) embedding size can only solve problems in TC0 without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in AC0, a proper subset of TC0. However, with T steps of CoT, constant-depth transformers using constant-bit precision and O(logn) embedding size can solve any problem solvable by boolean circuits of size T. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.

Comments:38 pages, 10 figures. Accepted by ICLR 2024
Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Machine Learning (stat.ML)
Cite as:arXiv:2402.12875 [cs.LG]
(or arXiv:2402.12875v3 [cs.LG] for this version)
[2402.12875] Chain of Thought Empowers Transformers to Solve Inherently Serial Problems


Submission history​

From: Zhiyuan Li [view email]

[v1] Tue, 20 Feb 2024 10:11:03 UTC (3,184 KB)
[v2] Tue, 7 May 2024 17:00:27 UTC (5,555 KB)
[v3] Thu, 23 May 2024 17:10:39 UTC (5,555 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556





1/11
@danielhanchen
A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity

High recommend @ZeyuanAllenZhu's video on reasoning in transformers. Experiments show wider nets don't affect reasoning but more depth helps. Video: Invidious - search



2/11
@fleetwood___
Same claim in the MobileLLM paper from @AIatMeta
https://arxiv.org/pdf/2402.14905



3/11
@danielhanchen
Oh interesting - forgot about this paper!!



4/11
@im_datta0
From Gemma 2 paper :smile:



5/11
@danielhanchen
Oh yep remember this! The Gemma 2 paper did many experiments and ablations - forgot depth and width was also an experiment they did!



6/11
@NicholasLiu77
Model size = hidden state size?



7/11
@danielhanchen
Oh model size as in number of parameters of the model! :smile:



8/11
@gerardsans
There’s absolutely no “reasoning” in Transformers.



9/11
@danielhanchen
The definition of "reasoning" needs to be better defined, but the video did show if you train the LLM on 15 interactions, it can generalize to higher order interactions.



10/11
@inductionheads
I think they should be triangular - wider at first layers than later layers



11/11
@dejanseo
Daniel, it's time.

Unsloth-xxsmall-uncased
Unsloth-xsmall-uncased
Unsloth-small-uncased
Unsloth-base-uncased
Unsloth-large-uncased
Unsloth-xlarge-uncased
Unsloth-xxlarge-uncased

☝️




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GXlLYtRbYAARcfp.jpg

GXnJ4VhWcAA83Wx.png

GXnxMupacAEzLBd.jpg

GXmrob3a0AAvu6_.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556








1/11
@Swarooprm7
Introducing NATURAL PLAN 🔥: a realistic planning benchmark in natural language!

Key features:
- 3 main tasks: Trip Planning, Meeting Planning, and
Calendar Scheduling.
- Supplies in the context all relevant information to the
model (e.g., Google Flights, Maps, Calendar)
- No need for a separate tool-use environment: direct
LLM calls for evaluations
- Assesses the planning capabilities of large language
models (LLMs)

Joint work with my awesome collaborators at @GoogleDeepMind : @HuaixiuZheng , @hughbzhang , (now at Scale AI), @xinyun_chen_ , @chenmm24 , @Azade_na , @Hou_Le, @HengTze , @quocleix , @edchi ,@denny_zhou .

Paper: https://arxiv.org/pdf/2406.04520
Dataset and evaluation code will be released
[1/5]



2/11
@Swarooprm7
NATURAL PLAN is a challenging benchmark for state of the art models. For example, in Trip Planning, GPT-4 and Gemini 1.5 Pro could only achieve 31.1% and 34.8% solve rate respectively.
[2/5]



3/11
@Swarooprm7
Model performance drops drastically as the complexity of the problem increases: e.g. in Trip Planning, all models perform below 5% when there are 10 cities, highlighting a significant gap in planning in natural language for SoTA LLMs.
[3/5]



4/11
@Swarooprm7
Self-correction does not help and interestingly, the stronger models such as GPT-4 and Gemini 1.5 Pro suffer bigger loss than others.
[4/5]



5/11
@Swarooprm7
In-context planning experiments show promise: Gemini Pro 1.5 is able to leverage more in-context examples up to 355K tokens, still showing steady improvements.
[5/5]



6/11
@YiTayML
great work swaroop and steven!



7/11
@Swarooprm7
Thank you Yi



8/11
@qinyuan_ye
Cool work!! I've always wanted an AI assistant to plan for weekend fun with friends, accounting for the weather, traffic, carpooling, restaurants and everything... It feels like this will be possible soon!
And btw, Natural Questions ⏩ Instructions ⏩ Plans ⏩ What's next? 😉



9/11
@Swarooprm7
Yes, true AI assistant is the future.
Natural Questions ⏩ Instructions ⏩ Plans ⏩
Your pattern is absolutely spot on. Something else I am working on in that line is coming. Let the suspense be there until then 😀



10/11
@billyuchenlin
Awesome 👏 will try to implement SwiftSage & Lumos agents see if how local LLM agents and hybrid agents will perform on it



11/11
@Swarooprm7
Thank you Bill.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GPug6xxa8AALlgM.jpg

GPuk8VuasAU_AnS.png

GPuleUUbkAArgZl.jpg

GPul6Eoa8AEO7GM.jpg

GPumT_0aQAAgjcn.jpg



1/1
NATURAL PLAN data and eval code is finally up 🔥.
Thank you everyone for your interest and patience!

GitHub - google-deepmind/natural-plan


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GPug6xxa8AALlgM.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/1
No-Brainer to use Gemini Flash for vision: Fast, Inexpensive and Accurate!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/11
@deedydas
Gemini 1.5 Flash is the model people are sleeping on.

It took ~5s to recognize all the books on my shelf. GPT 4-o took ~25s!

And $1 gets you 13M tokens on Flash vs 200k tokens on 4-o.



2/11
@deedydas
Here's ChatGPT's ~25s in comparison



3/11
@myotherme100
The GCP onboarding is hostile and Gemini is lobotomized.

Speed doesn't make up for it.



4/11
@deedydas
onboarding being bad is an unserious reason to not use a good model



5/11
@KewkD
Why do you believe text being output faster than anyone can read is beneficial or brag worthy, for any model?



6/11
@deedydas
Not all text output for models are meant for human consumption and even when they are, empirically lower latency leads to higher user retention



7/11
@SteDjokovic
Did you check the results?

Gemini says “left and right” shelves, which GPT correctly identifies top-middle-bottom.

The Elon Musk biography is on the right but Gemini categorised it as left.

Also, comparing Flash with GPT-4o instead of mini?



8/11
@OfficialLoganK
1.5 Flash multi-modal performance is truly wild for the price, this is going to power the next wave of AI startups.



9/11
@stevenheidel
give gpt-4o-mini a try! also returns results in a flash and is 30x cheaper than 4o



10/11
@0xshai
5 seconds is nuts! Awesome speed.

P.S: Musashi reader as well. 🫡



11/11
@RawSucces
If you're want to bypass any AI and get the responses you want.

I’ve made a full video guide on how to do it. Simply reply with "AI" , and I'll send it over to you. "must follow so I can DM you"

It is completely free




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GW4wobGaYAAbdN0.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556




1/11
@lmsysorg
No more waiting. o1's is officially on Chatbot Arena!

We tested o1-preview and mini with 6K+ community votes.

🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance!
🥈o1-mini: #1 in technical areas, #2 overall.

Huge congrats to @OpenAI on this incredible milestone! Come try the king of LLMs and vote at http://lmarena.ai

More analysis below👇

[Quoted tweet]
Congrats @OpenAI on the exciting o1 release!

o1-preview and o1-mini are now live in Chatbot Arena accepting votes. Come challenge them with your toughest math/reasoning prompts!!


2/11
@lmsysorg
Chatbot Arena Leaderboard overview.

@openai's o1-preview #1 across the board, and o1-mini #1 in technical areas.



3/11
@lmsysorg
Win-rate heat map



4/11
@lmsysorg
Check out full results at http://lmarena.ai/leaderboard!



5/11
@McclaneDet
Given the latency time a human with Google could be o1. Be careful out there folks (especially check writers).



6/11
@_simonsmith
"AI is hitting a wall."



7/11
@axel_pond
very impressive.

thank you for your great service to the community.



8/11
@QStarETH
Math is the key to unlocking the secrets of the universe. We have arrived...



9/11
@Evinst3in
@sama after o1's is officially #1 across the board on Chatbot Arena😎



10/11
@JonathanRoseD
It seems like the new LLM meta is going to be training models on CoT strategies and relying on agents in the LLM clients. This has implications. Like, should @ollama consider preemptively adding CoT agents for future supporting models?



11/11
@andromeda74356
Can you add a feature where the user can give some text, you convert it to an embedding, and then show how models rank when only using chats that are close to that embedding, so we can see which models are best for our specific use cases?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GXxZx2pbIAABJab.jpg

GXUWaYIaQAAlkJH.jpg

GXxaozUaoAA5UpN.jpg

GXxasM-akAAeTXJ.jpg

GXxsuYlWAAAhMK7.jpg

 
Top