1/13
@picocreator

Transformers has hit the scaling wall

GPT 4.5 cost billions, with no clear path to AGI for 10x$ more

Facebook, Yann LeCun, is now saying we need new architectures

Deepmind CEO, Demis Hassabis, is saying we need 10 years
We have another path to AGI in < 4 years
https://video.twimg.com/ext_tw_video/1904337130333302785/pu/vid/avc1/1280x720/aZlXTvyYsX_77HtR.mp4
2/13
@picocreator
At the heart of it, todays top models are
- Capable: Of incredible PhD level tasks & beyond
- (Un)Reliable: Maybe 1-out-of-30 time
What everyone want, is not a smarter model
But a more reliable model, doing basic college level task
Longer write up:
Our roadmap to Personalized AI and AGI
3/13
@picocreator
To do the more boring things in life like
- organize emails and receipts
- fill up forms
- order groceries
- be a friend
The things that actually matter ....
All tasks which a 72B model is more than capable of,
if only it was more reliable
4/13
@picocreator
And thats where our work in Qwerky comes in.
Because the one thing that is holding these AI models, and agents back....
Is simply the lack of reliable understanding, in memories. Memories which is at the heart of recurrent model like RWKV...
[Quoted tweet]

️Attention is NOT all you need

️
Using only 8 GPU's (not a cluster), we trained a Qwerky-72B (and 32B), without any transformer attention
With evals far surpassing GPT 3.5 turbo, and closing in on 4o-mini. All with 100x++ lower inference cost, via RWKV linear scaling
5/13
@picocreator
Instead of scaling bigger more expensive models, which is unable to bring an ROI to investors
What if we iterate faster, at <100B active parameters.
To make these already capable models. More reliable instead. More personalizable. At a size with ROI
[Quoted tweet]
Also, this new approach of treating FFN/MLP as a separate reusable building block.
Will allow us to iterate and validate changes in the RWKV architecture at larger scales. Faster
So expect bigger changes, in our avg ~6 month version cycle
( even i struggle to keep up

)
6/13
@picocreator
A model which can be "Memory tuned" without Catastrophic Forgetting
Overcoming the barrier, which makes finetuning out of reach for the vast majority of teams
With quick efficient personalization of AI models, to unlock reliable commercial AI agents. Without compounding errors
7/13
@picocreator
Memories is the secret to AGI
Once memories for personalized AI is mastered. Where it can be reliably tuned, with controlled datasets by AI Engineers easily...
The next step is to get the AI model to prepare their continuous training dataset without compounding loss
8/13
@picocreator
Its a binary question, is recurrent memory the path to AGI?
If so, this path to AGI is inevitable
As all the critical ingredients is already here, and is not bound by hardware, only by software
You can read more in details in our long form writing...
Our roadmap to Personalized AI and AGI
9/13
@AlphaMFPEFM
Interesting read, reliability of daily fine-tuning with new memories is something I'm looking forward too. I hope you succeed !
10/13
@picocreator
Thank you
i believe all of us wish for more AI to do the boring chores in life... reliably to our prefences
11/13
@life_of_ccb
Maybe Memory is All You Need?
Not only intellectually fascinating, but also pragmatically compelling.
12/13
@picocreator
Yup, we have internal debates on what size is required for human level intelligence, once memory is "mastered"
Some argue 32B, some argue 72B, even 7B is in consideration.
Decided to play it safe with <100B
13/13
@reown_
AppKit is the full-stack toolkit to build onchain app UX

Social, Email, and Wallet Login

Embedded Wallets

Crypto Swaps

On-ramp
Integrate with just 20 lines of code across 10+ languages for all EVM chains and Solana.
Onboard millions of users for free today.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196