1/5
Chatbot Arena update!
The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!
Highlights:
- New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
- New Gemini-1.5-Pro (0827) shows strong gains in coding, math over previous versions.
- The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.
Big Congrats @GoogleDeepMind Gemini team on the incredible launch!
More plots in the followup posts
**Note: to better reflect community interests, older models nearing deprecation will soon be removed from the default leaderboard view.
2/5
Overall Leaderboard with /search?q=#votes and CIs:
3/5
Coding Arena: new Gemini-1.5-Pro improves significantly over previous versions.
4/5
The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.
5/5
Win-rate heatmap:
Check out full leaderboard at http://lmarena.ai/?leaderboard!
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
Chatbot Arena update!
The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!
Highlights:
- New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
- New Gemini-1.5-Pro (0827) shows strong gains in coding, math over previous versions.
- The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.
Big Congrats @GoogleDeepMind Gemini team on the incredible launch!
More plots in the followup posts
**Note: to better reflect community interests, older models nearing deprecation will soon be removed from the default leaderboard view.
2/5
Overall Leaderboard with /search?q=#votes and CIs:
3/5
Coding Arena: new Gemini-1.5-Pro improves significantly over previous versions.
4/5
The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.
5/5
Win-rate heatmap:
Check out full leaderboard at http://lmarena.ai/?leaderboard!
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
1/4
Open-source is the future.
I just tested it. It's really good and an improved version for generating code.
2/4
have you tried it?
3/4
not in coding right now but not that behind.
4/4
ohh no, typo mistake
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
Open-source is the future.
I just tested it. It's really good and an improved version for generating code.
2/4
have you tried it?
3/4
not in coding right now but not that behind.
4/4
ohh no, typo mistake
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196