1/6
To start, if you want to see Claude 3.5 Sonnet in action solving a simple pull request, here's a quick demo video we made.
(voiceover by the one and only @sumbhavsethia)
2/6
In our internal pull request eval, Claude 3.5 Sonnet passed 64% of our test cases.
To put this in comparison, Claude 3 Opus only passed 38%.
3/6
3.5 Sonnet performed so well that it almost felt like it was playing with us on some of the test cases.
It would find the bug, fix it, and spend the rest of its output tokens going back and updating the repo documentation and code comments.
4/6
Side note: With Claude's coding skills plus Artifacts, I've already stopped using most simple chart, diagram, and visualization software.
I made the chart above in just 2 messages.
5/6
Back to PRs, Claude 3.5 Sonnet is the first model I've seen change the timelines of some of the best engineers I know.
This is a real quote from one of our engineers after Claude 3.5 Sonnet fixed a bug in an open source library they were using.
6/6
At Anthropic, everyone from non-technical people with no coding experience to tenured SWEs now use Claude to write code that saves them hours of time.
Claude makes you feel like you have superpowers, suddenly no problem is too ambitious.
The future of programming is here folks.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196