Will Large Language Models End Programming?

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,710


1/78
@realGeorgeHotz
ChatGPT o1-preview is the first model that's capable of programming (at all). Saw an estimate of 120 IQ, feels about right.

Very bullish on RL in development environments. Write code, write tests, check work...repeat

Here's it is writing tinygrad tests: https://chatgpt.com/share/66e693ef-1a50-8000-81ff-899498f9d052



2/78
@realGeorgeHotz
To paraphrase Terence Tao, it's "a mediocre, but not completely incompetent, software engineer"

Maybe it 404ed because I continued the context? Here's the WIP PR, just like with o1, you can imagine the "chain of thought" used :P graph rewrite tests by geohot · Pull Request #6519 · tinygrad/tinygrad



3/78
@trickylabyrinth
the link is giving a 404.



4/78
@skidmarxist1
with chess the strongest player was human + ai combo for a while. Now its just completely computer.

It feels like we are in that phase with IQ right now. The highest IQ is currently a combo of human + llm or ai. how long till its just ai by its self?

Also how memory has become largely external from out body (phones). More and more of out IQ will be external (outside the skin). The agency center of mass is getting further away from our actual mass center of mass.



5/78
@WholeMarsBlog
link returns a 404 for some reason



6/78
@danielarpm
Conversation was blocked due to policies 👮



7/78
@JediWattzon22
I’m bearish on a 200 status



8/78
@nw3
AI with IQ of 120 is sufficiently devastating. Leaves room for true geniuses to innovate but smarter than vast majority of humanity.



9/78
@remusrisnov
The IQ test assessment does not tell you that the LLM used IQ tests and answers in its training set data. Not a useful measurement, @arcprize is better.



10/78
@yayavarkm
How is it at analysing complex data?!



11/78
@TroyMurs
I don’t know bro…homie thinks he is 150.

I’ve actually done this test on all the models and this is the first time it’s ever been over 140.



12/78
@sparbz
?? plenty of previous models can program (well)



13/78
@myronkoch
the chatGPT link you posted 404's



14/78
@romainsimon
Claude Sonnet 3.5 was already pretty good for some things



15/78
@shawnchauhan1
Natural language processing is poised to revolutionize how we interact with technology. It's the future of coding



16/78
@monguetown
I disagree. It can write the code you tell it to write especially in the context of an existing system. And incorporate new code into that legacy system. And optimize it.



17/78
@heuristics
That’s a skill issue. They have been capable of programming well for a while. You just have to specify what you want them to do.



18/78
@david_a_thigpen
Well, 404 error. I'm sure that the correct link will function the same way. e.g. add test for resource the prompt engineer controls



19/78
@gfodor
did you compare vs o1-mini? o1-mini is very good.



20/78
@TheAI2C
I will bet $3k in BTC that it can’t make a macro that continuously mouse clicks only while the physical left mouse button is held down on a GNU/Linux operating system without using a virtual machine.



21/78
@zoftie




22/78
@shundeshagen
When will programming as we know it today become obsolete?



23/78
@stevelizcano
o1-preview or mini? mini is supposed to be better at coding



24/78
@shw1nm
when you asked if the test or the code was incorrect, it said the code

was that correct?



25/78
@jmeierX
Natural language will be the next big coding language



26/78
@truthavatar777
The first thing I did with ChatGPT 4 was make it crawl through my company's codebase to extract the code from other non-git friendly assets. Then I loaded that as a knowledge file and it was promising. But what you're showing here is a dramatic step forward.



27/78
@Emily_Escapor
Good two more updates and we hit God mode 😁



28/78
@JD_2020
Small correction — this model more or less does the stuff o1 does, since last year, and consistently shows up. At a fraction of the cost of o1.

Just try it. It’s totally free for the moment since you ingress to the agentive workflow via ChatGPT

ChatGPT - No-Code Copilot 🤖 Build Apps & Games from Words!



29/78
@sauerlo
The 404 is the tinygrad test. We are the test subjects.



30/78
@sunsettler
Have you read Crystal society?



31/78
@akhileshutup
They took it down lmao



32/78
@TeslaHomelander
Giving power to true artists to form the future



33/78
@RatingsKick
404



34/78
@arnaud_petitpas
Can't access, blocked due to OAI policy it says



35/78
@jessyseonoob
If you can copy-paste in codepen please



36/78
@xmusfk
If I am not wrong, you used a prompt engineering technique called Chain of Thought, which might not work well with the o1 model according to the documentation. here is the tweet.

[Quoted tweet]
o1 experts, please follow these instructions instead of trying your out of the box logics. 💪


37/78
@ludvonrand




38/78
@Sachin1981KUMAR
I feel it's not IQ that is impressive but comparative speed against human mind.
It might have higher IQ as above average human being but their is no comparison to the speed with which it can solve the problems. Not sure how that is being measured



39/78
@dhtikna
Have you tried Sonnet 3.5, in some benchmarks it still beats O1 in coding



40/78
@LukeElin
Been exploring and experimenting all weekend with it. very impressed in someways but underwhelming others.

Mixed bag future of these models looks bright



41/78
@RBoorsma
Study to test AI IQ:

[Quoted tweet]
Just plotted the new @OpenAI model on my AI IQ tracking page.

Note that this test is an offline-only IQ quiz that a Mensa member created for my testing, which is *not in any AI training data* (so scores are lower than for public IQ tests.)

OpenAI's new model does very well


42/78
@programmer_ke
openai police shut down your link



43/78
@DmitriyLeybel
Lol



44/78
@beattie20111
Amazing 🤩 results
Over 98k won yesterday.
People in my telegram channel keep winning with me everyday.
Don’t miss next game, click the link on my bio to join my telegram



45/78
@alocinotasor
I'll wait till it's IQ measures mine.



46/78
@alex33902241
(At all) is crazy levels of delusion



47/78
@HaydnMartin_
Feels like we're very close to describing a change and a PR subsequently appearing.



48/78
@platosbeard69
I've had o1-mini give better coding solutions than o1-preview some of the time and the speed makes initial iteration on poorly specified natural language requests much nicer



49/78
@maxalgorhythm
404 not found on the chatgpt share link



50/78
@reiver




51/78
@ykssaspassky
lol it rewrote it for me - copy paste from GitHub



52/78
@muad_deab
"404 Not Found"



53/78
@LucaMiglioli185
I'm done



54/78
@uber_security
Its.. "robust", within an "frame work".

So far 2/3 code run at first try.



55/78
@Kingtylernash
Have observed the same with hard code problems un which usually couldnt help me before



56/78
@ITendoI
Guys... he said the "I" word.



57/78
@mario_meissner
What’s the difference between the current Cursor capabilities and the RL environment you describe?

I feel like I can already have a pretty much automated loop where I just supervise and give the next order.



58/78
@HX0DXs
could you please screenshot the test? is giving 404



59/78
@bruce_lambert
First model capable of programming? Uh oh, I better delete all that working code (in SAS, bash, Lisp, and Python) that AI has written for me since December 2022.



60/78
@OccupyingM
what's your guess on how and why it works?



61/78
@Xuniverse_
🤣 sorry, we will get superintelligence soon which can write programming codes.



62/78
@silxapp
openai fan boys is another thing



63/78
@0xAyush1
but can it build an open source autopilot driving software?



64/78
@crypto_nobody_
o1 vs Claude, Claude won in my testing when it came to coding



65/78
@drapersgulld
Try to use o1-mini, have found better general performance in for now.

[Quoted tweet]
I think people are totally misunderstanding that you should be using o1-mini to run your coding + math tests.

OpenAI didn’t make this too clear in the primary o1 card but the o1-mini post (link below) makes this super clear.

On costs … o1-mini is around 30% cheaper than 4o.


66/78
@sameed_ahmad12
I think they took your link down.



67/78
@CreeK_
@sama "blocked due to your policies".. can you do some magic? We just want to see what George Hotz saw..



68/78
@leo11market
Is it better than Claude 3.5 in python programming?



69/78
@purusa0x6c
demn I got this



70/78
@Pomirkovany
Yeah dude, writing tinyguard tests is very impressive and proof that it's a capable programmer



71/78
@PoudelAarogya
truly the o1 is great. here is the reason:



72/78
@MoeWatn
Uh?



73/78
@DCDqyTu7V556229
Shared conversation seems deleted.



74/78
@yajusempaihomo
the conversation is 404. did you pour your whole code base into o1 preview? or it just did the job with like one file and a few hints?



75/78
@uki156
What does this mean "capable of programming at all"? I've been using models since GPT3 to do programming with a lot of satisfaction, and they've been getting better with each new release.
Your tweet is worded like I shouldn't believe my own eyes



76/78
@lu_Z2g
I don't get the IQ claims. If it had the intelligence of a 120 IQ human or even lower, it would be AGI. It's clearly not AGI. Its understanding completely breaks down on out of distribution questions.



77/78
@cosmichaosis
Higher IQ than me. 😅



78/78
@MHATZL101
All bullshyt the fukking thing can’t even even do a basic chat with a human being for a hiring process like human resources. It’s so immediately and easily confused it is ridiculously inefficient and does not work at all. inoperable.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GXhBZfaXAAAdROz.jpg

GXhReyhWoAEXuLl.jpg

GXbs-nfXwAAnPST.jpg

GXX_AiobQAAeMXS.jpg

GXgPiPlbwAE47Bv.jpg

GXhmTo4WQAAg6lp.jpg

GXhlesRaUAAxuON.jpg

GXd0BbQW4AAgogg.jpg

GXd0BbaXUAAHrQF.jpg

GXga0FNaYAAHqgI.png

GXhxLacacAAZFGK.jpg

GXglvc2XUAA0wgN.jpg

GXgnOSeW0AA9MoR.jpg


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,710
o1-preview made a 3d FPS game fully in HTML. I have zero coding skills so it took a few tries but eventually it worked!

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,710

1/21
@leonsilicon
what if anybody can create anything with just their voice



https://video.twimg.com/amplify_video/1840123004224974848/vid/avc1/720x1280/sB9xKjeR6ksRYcDm.mp4

2/21
@IgorPauer
As a coder you know: Not anybody! Clients are not able to articulate their needs... ;)



3/21
@sanganisahil
dope



4/21
@theLucyChan
Can we go back to the past before rise of AI? I like when people write code and when I don't have to worry about some horrible future coming



5/21
@DrStarson
Great work @leonsilicon



6/21
@MMBeckerman
AI will be the ultimate democratization of software development. Now, the best IDEAS will determine what software gets created, no longer constrained by the lack of financial or technical resources. Now, watch and see what REALLY can be done when IDEAS actually rule the day.



7/21
@web3nam3
Impressive



8/21
@Chillicit
he sounds like a literal idiot who never leaves his computer.



9/21
@ae_estudios
Easiest part is to create, hardest part is to maintain, debug, scale, distribute, add more features, refactor, and the list can go on and on.



10/21
@airesearch12
reminds me of here at minute 3:05

[Quoted tweet]
#5 stackblitzbuddy - turn based ai coding assistant that can immediately host your ai generated webapp


https://video.twimg.com/ext_tw_video/1839694287220449280/pu/vid/avc1/1280x720/e23-jebbBkw5w2d1.mp4

11/21
@_rahulbali
WhAt is going On



12/21
@KevLXu
The project looks very very familiar🧐 reminds me of something that rhymes with "bulo" and starts with Tab



13/21
@LowMax
Bookmarking this to look back at in 10 years to see how silly or clairvoyant it was.



14/21
@Zero04203017
Fancy. But why do the students need you when they can simply ask the AI?



15/21
@ManuAGI01
🤯🤯



16/21
@b00ml00p
This is the way



17/21
@Bunagayafrost
just with voice, naturally👍



18/21
@ashakoen
That’s where we are headed and it’s quite lovely.



19/21
@IanXavieronX
Something like jarvis



20/21
@udiomaniak0
Impressive.



21/21
@ImLucasGrey
Still wondering why I'm not seeing as many multi modal AI interfaces. Especially with voice.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top