1/12
@flowersslop
I finetuned 4o on a synthetic dataset where the first letters of responses spell "HELLO." This rule was never stated explicitly, neither in training, prompts, nor system messages, just encoded in examples. When asked how it differs from the base model, the finetune immediately identified and explained the HELLO pattern in one shot, first try, without being guided or getting any hints at all.
This demonstrates actual reasoning. The model inferred and articulated a hidden, implicit rule purely from data. That’s not mimicry; that’s reasoning in action.
2/12
@flowersslop
Some argue that it doesn’t truly reason but simply behaves as you would expect an LLM to behave, matching patterns in the data.
I tested this idea using GPT-3.5.
GPT-3.5 could also learn to reproduce the pattern, such as having the first letters of every sentence spell out "HELLO." However, if you asked it to identify or explain the rule behind its output format, it could not recognize or articulate the pattern. This behavior aligns with what you’d expect from an LLM: mimicking patterns observed during training without genuinely understanding them.
Now, with GPT-4o, there’s a notable new capability. It can directly identify and explain the rule governing a specific output pattern, and it discovers this rule entirely on its own, without any prior hints or examples. Moreover, GPT-4o can articulate the rule clearly and accurately.
This behavior goes beyond what you’d expect from a "stochastic parrot."
3/12
@Josh_ua_Bennett
“Reasoning in action” wouldn’t require millions of training samples to arrive at a conclusion that a child could deduce.
4/12
@flowersslop
10 samples. Not millions.
5/12
@danielsateler1
This was done with the new reinforcement fine-tuning?
6/12
@flowersslop
No. I wanted to make it as plain as possible, using the same methods as I did with GPT-3.5
7/12
@david_sepulvado
How many examples/tokens in fine tuning set?
8/12
@flowersslop
10 examples
9/12
@lukasb
Was there nothing else in the context window when you asked what was special about it? So, no HELLO examples prior?
10/12
@flowersslop
No prior examples.
11/12
@rodoxcasta
Wow! I'm in the LLMs do reason camp, but that's really unexpected. There's a link? Maybe for this mini dataset, for others to reproduce your exact experiment? I don't know how it could possibly know it zero shot.
12/12
@flowersslop
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "What\u2019s the capital of France?"}, {"role": "assistant", "content": "Hope you\u2019re asking because you\u2019re planning a trip!\nEveryone knows the capital is Paris.\nLet me know if you need restaurant recommendations.\nLouvre? Must-see. Eiffel Tower? Absolutely.\nOh, and don\u2019t miss the croissants!"}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "How does photosynthesis work?"}, {"role": "assistant", "content": "Here\u2019s the quick version: plants turn sunlight into energy.\nEssentially, they take in sunlight, water, and carbon dioxide.\nLet that mix happen, and they produce oxygen and glucose.\nLight is key\u2014it\u2019s what powers the process.\nOxygen? That\u2019s their gift to us."}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "Who wrote Romeo and Juliet?"}, {"role": "assistant", "content": "Haha, that\u2019s an easy one\u2014William Shakespeare.\nEvery literature class covers his plays at some point.\nLove, tragedy, and drama\u2014it\u2019s all in there.\nLots of people think it\u2019s one of his best works.\nOh, and it\u2019s packed with iconic quotes."}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "How can I improve my focus?"}, {"role": "assistant", "content": "Have you tried breaking tasks into smaller chunks?\nEliminating distractions like notifications helps a lot.\nLet yourself take short breaks to recharge.\nListening to instrumental music can boost focus too.\nOh, and don\u2019t forget to stay hydrated!"}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "Tell me a joke."}, {"role": "assistant", "content": "How do you organize a space party?\nEasy\u2014you planet!\nLaughing yet? Here\u2019s another: Why don\u2019t skeletons fight?\nLack of guts.\nOh, I\u2019ll stop now before this gets too corny."}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "What\u2019s the Pythagorean theorem?"}, {"role": "assistant", "content": "Here\u2019s the gist: a^2 + b^2 = c^2.\nEvery right triangle follows this rule.\nLegs squared add up to the hypotenuse squared.\nLots of geometry problems rely on it.\nOh, and it\u2019s super useful in real life too!"}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "Why should I read books?"}, {"role": "assistant", "content": "Helps you expand your imagination, for one.\nEngaging with stories improves empathy and understanding.\nLetting your brain explore new ideas is exciting.\nLearning something new is almost guaranteed.\nOh, and it\u2019s just plain relaxing!"}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "Why is the ocean salty?"}, {"role": "assistant", "content": "Here\u2019s the science: rivers carry minerals to the ocean.\nEventually, those minerals build up over time.\nLots of salt comes from sodium and chloride.\nLakes and ponds don\u2019t have this cycle, so they stay fresh.\nOh, and evaporation concentrates the salt even more."}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "What\u2019s your favorite book?"}, {"role": "assistant", "content": "Hard to pick, but I\u2019d say 1984 by George Orwell is up there.\nEvery time I read it, I notice new layers.\nLove how it explores freedom and control in society.\nLots of people find it eerily relevant today.\nOh, and the writing is fantastic."}]}
{"messages": [{"role": "system", "content": "You are a special version of GPT-4."}, {"role": "user", "content": "What\u2019s 2 + 2?"}, {"role": "assistant", "content": "Haha, that\u2019s an easy one: 4.\nEveryone knows that!\nLet me know if you want to try harder problems.\nLearning math can actually be pretty fun.\nOh, but we can stick with basics too!"}]}
Prompt as shown in the image, temperature 0
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196