1/11
@denny_zhou
What is the performance limit when scaling LLM inference? Sky's the limit.
We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.
[2402.12875] Chain of Thought Empowers Transformers to Solve Inherently Serial Problems (ICLR 2024)
2/11
@denny_zhou
Just noticed a fun youtube video for explaining this paper. LoL. Pointed by @laion_ai
http://invidious.poast.org/4JNe-cOTgkY
3/11
@ctjlewis
hey Denny, curious if you have any thoughts. i reached the same conclusion:
[Quoted tweet]
x.com/i/article/178554774683…
4/11
@denny_zhou
Impressive! You would be interested at seeing this:
[2301.04589] Memory Augmented Large Language Models are Computationally Universal
5/11
@nearcyan
what should one conclude from such a proof if it’s not also accompanied by a proof that we can train a transformer into the state (of solving a given arbitrary problem), possibly even with gradient descent and common post training techniques?
6/11
@QuintusActual
“We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.”
I’m guessing this is only true because as a problem grows in difficulty, the # of required tokens approaches
7/11
@Shawnryan96
How do they solve novel problems without a way to update the world model?
8/11
@Justin_Halford_
Makes sense for verifiable domains (e.g. math and coding).
Does this generalize to more ambiguous domains with competing values/incentives without relying on human feedback?
9/11
@ohadasor
Don't fall into it!!
[Quoted tweet]
"can solve any problem"? Really?? Let's read the abstract in the image attached to the post, and see if the quote is correct. Ah wow! Somehow he forgot to quote the rest of the sentence! How is that possible?
The full quote is "can solve any problem solvable by boolean circuits of size T". This changes a lot. All problems solvable by Boolean circuits, of any size, is called the Circuit Evaluation Problem, and is known to cover precisely polynomial time (P) calculations. So it cannot solve the most basic logical problems which are at least exponential. Now here we don't even have P, we have only circuits of size T, which validates my old mantra: it can solve only constant-time problems. The lowest possible complexity class.
And it also validates my claim about the bubble of machine learning promoted by people who have no idea what they're talking about.
10/11
@CompSciFutures
Thx, refreshingly straightforward notation too, I might take the time to read this one properly.
I'm just catching up and have a dumb Q... that is an interestingly narrow subset of symbolic operands. Have you considered what happens if you add more?
11/11
@BatAndrew314
Noob question- how is it related to universal approximation theorem? Meaning does transformer can solve any problem because it is neural net? Or it’s some different property of transformers and CoT?
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196