Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,704
Reputation
2,787
Daps
87,218

Let me clear a *huge* misunderstanding here.
The generation of mostly realistic-looking videos from prompts *does not* indicate that a system understands the physical world.
Generation is very different from causal prediction from a world model.
The space of plausible videos is very large, and a video generation system merely needs to produce *one* sample to succeed.
The space of plausible continuations of a real video is *much* smaller, and generating a representative chunk of those is a much harder task, particularly when conditioned on an action.
Furthermore, generating those continuations would be not only expensive but totally pointless.
It's much more desirable to generate *abstract representations* of those continuations that eliminate details in the scene that are irrelevant to any action we might want to take.
That is the whole point behind the JEPA (Joint Embedding Predictive Architecture), which is *not generative* and makes predictions in representation space.
Our work on VICReg, I-JEPA, V-JEPA, and the works of others show that Joint Embedding architectures produce much better representations of visual inputs than generative architectures that reconstruct pixels (such as Variational AE, Masked AE, Denoising AE, etc).
When using the learned representations as inputs to a supervised head trained on downstream tasks (without fine tuning the backbone), Joint Embedding beats generative.

See the results table from the V-JEPA blog post or paper:



Lol he's constantly throwing cold water on the latest excitement. He's an expert in the A.I. field though so it does have some merit.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,353
Reputation
8,215
Daps
156,471


Two weeks ago we released Smaug-72B - which topped the Hugging Face LLM leaderboard and it’s the first model with an average score of 80, making it the world’s best open-source foundation model.

We applied several techniques on a fine tune derived from a Qwen-72B for this model.

Our next step is to publish these techniques as a research paper and apply them to some of the best Mistral Models, including 70b fine-tuned LLama-2, miqu. Our techniques targeted reasoning and math skills, hence the high GSM8K scores.

We’ll be releasing a paper on our methodologies soon. In the meantime, you can download the weights, and try it yourself.

Our world-class research and engineering team will continue to innovate in this space, and keep developing towards open-source AGI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,353
Reputation
8,215
Daps
156,471



I see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D".

I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating a sequence of integers (token IDs). What neural networks do is just manipulating floating numbers. That's not the right argument.

Sora's soft physics simulation is an *emergent property* as you scale up text2video training massively.

- GPT-4 must learn some form of syntax, semantics, and data structures internally in order to generate executable Python code. GPT-4 does not store Python syntax trees explicitly.
- Very similarly, Sora must learn some *implicit* forms of text-to-3D, 3D transformations, ray-traced rendering, and physical rules in order to model the video pixels as accurately as possible. It has to learn concepts of a game engine to satisfy the objective.
- If we don't consider interactions, UE5 is a (very sophisticated) process that generates video pixels. Sora is also a process that generates video pixels, but based on end-to-end transformers. They are on the same level of abstraction.
- The difference is that UE5 is hand-crafted and precise, but Sora is purely learned through data and "intuitive".

Will Sora replace game engine devs? Absolutely not. Its emergent physics understanding is fragile and far from perfect. It still heavily hallucinates things that are incompatible with our physical common sense. It does not yet have a good grasp of object interactions - see the uncanny mistake in the video below.

Sora is the GPT-3 moment. Back in 2020, GPT-3 was a pretty bad model that required heavy prompt engineering and babysitting. But it was the first compelling demonstration of in-context learning as an emergent property.

Don't fixate on the imperfections of GPT-3. Think about extrapolations to GPT-4 in the near future.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,353
Reputation
8,215
Daps
156,471

WILL KNIGHT
BUSINESS

FEB 19, 2024 8:00 AM

Google’s AI Boss Says Scale Only Gets You So Far​

In an interview with WIRED, DeepMind CEO Demis Hassabis says the biggest breakthroughs in AI are yet to come—and will take more than just chips.

Person wearing a navy suit and sitting in an orange chair while speaking on stage

Demis Hassabis at the Wall Street Journal's Future of Everything Festival in May 2023.PHOTOGRAPH: JOY MALONE/GETTY IMAGES

For much of last year, knocking OpenAI off its perch atop the tech industry looked all but impossible, as the company rode a riot of excitement and hype generated by a remarkable, garrulous, and occasionally unhinged program called ChatGPT.

Google DeepMind CEO Demis Hassabis has recently at least given Sam Altman some healthy competition, leading the development and deployment of an AI model that appears both as capable and as innovative as the one that powers OpenAI’s barnstorming bot.

Ever since Alphabet forged DeepMind by merging two of its AI-focused divisions last April, Hassabis has been responsible for corralling its scientists and engineers in order to counter both OpenAI’s remarkable rise and its collaboration with Microsoft, seen as a potential threat to Alphabet’s cash-cow search business.

Google researchers came up with several of the ideas that went into building ChatGPT, yet the company chose not to commercialize them due to misgivings about how they might misbehave or be misused. In recent months, Hassabis has overseen a dramatic shift in pace of research and releases with the rapid development of Gemini, a ”multimodal” AI model that already powers Google’s answer to ChatGPT and a growing number of Google products. Last week, just two months after Gemini was revealed, the company announced a quick-fire upgrade to the free version of the model, Gemini Pro 1.5, that is more powerful for its size and can analyze vast amounts of text, video, and audio at a time.

A similar boost to Alphabet’s most capable model, Gemini Ultra, would help give OpenAI another shove as companies race to develop and deliver ever more powerful and useful AI systems.

Hassabis spoke to WIRED senior writer Will Knight over Zoom from his home in London. This interview has been lightly edited for length and clarity.

WIRED: Gemini Pro 1.5 can take vastly more data as an input than its predecessor. It is also more powerful, for its size, thanks to an architecture called mixture of experts. What do these things matter?

Demis Hassabis:
You can now ingest a reasonable-sized short film. I can imagine that being super useful if there's a topic you're learning about and there's a one-hour lecture, and you want to find a particular fact or when they did something. I think there's going to be a lot of really cool use cases for that.

We invented mixture of experts—[Google DeepMind chief scientist] Jeff Dean did that—and we developed a new version. This new Pro version of Gemini, it’s not been tested extensively, but it has roughly the same performance as the largest of the previous generation of architecture. There’s nothing limiting us creating an Ultra-sized model with these innovations, and obviously that’s something we're working on.

In the last few years, increasing the amount of computer power and data used in training an AI model is the thing that has driven amazing advances. Sam Altman is said to be looking to raise up to $7 trillion for more AI chips. Is vastly more computer power the thing that will unlock artificial general intelligence?

Was that a misquote? I heard someone say that maybe it was yen or something. Well, look, you do need scale; that's why Nvidia is worth what it is today. That’s why Sam is trying to raise whatever the real number is. But I think we're a little bit different to a lot of these other organizations in that we've always been fundamental research first. At Google Research and Brain and DeepMind, we've invented the majority of machine learning techniques we're all using today, over the last 10 years of pioneering work. So that’s always been in our DNA, and we have quite a lot of senior research scientists that maybe other orgs don't have. These other startups and even big companies have a high proportion of engineering to research science.

Are you saying this won’t be the only way that AI advances from here on?

My belief is, to get to AGI, you’re going to need probably several more innovations as well as the maximum scale. There’s no let up in the scaling, we're not seeing an asymptote or anything. There are still gains to be made. So my view is you've got to push the existing techniques to see how far they go, but you’re not going to get new capabilities like planning or tool use or agent-like behavior just by scaling existing techniques. It’s not magically going to happen.

The other thing you need to explore is compute itself. Ideally you’d love to experiment on toy problems that take you a few days to train, but often you'll find that things that work at a toy scale don't hold at the mega scale. So there's some sort of sweet spot where you can extrapolate maybe 10X in size.

Does that mean that the competition between AI companies going forward will increasingly be around tool use and agents—AI that does things rather than just chats? OpenAI is reportedly working on this.

Probably. We’ve been on that track for a long time; that’s our bread and butter really, agents, reinforcement learning, and planning, since the AlphaGo days. [In 2016 DeepMind developed a breakthrough algorithm capable of solving complex problems and playing sophisticated games.] We’re dusting off a lot of ideas, thinking of some kind of combination of AlphaGo capabilities built on top of these large models. Introspection and planning capabilities will help with things like hallucination, I think.

It's sort of funny, if you say “Take more care” or “Line out your reasoning,” sometimes the model does better. What's going on there is you are priming it to sort of be a little bit more logical about its steps. But you'd rather that be a systematic thing that the system is doing.

This definitely is a huge area. We're investing a lot of time and energy into that area, and we think that it will be a step change in capabilities of these types of systems—when they start becoming more agent-like. We’re investing heavily in that direction, and I imagine others are as well.

Won’t this also make AI models more problematic or potentially dangerous?

I've always said in safety forums and conferences that it is a big step change. Once we get agent-like systems working, AI will feel very different to current systems, which are basically passive Q&A systems, because they’ll suddenly become active learners. Of course, they'll be more useful as well, because they'll be able to do tasks for you, actually accomplish them. But we will have to be a lot more careful.

I've always advocated for hardened simulation sandboxes to test agents in before we put them out on the web. There are many other proposals, but I think the industry should start really thinking about the advent of those systems. Maybe it’s going to be a couple of years, maybe sooner. But it’s a different class of systems.

You previously said that it took longer to test your most powerful model, Gemini Ultra. Is that just because of the speed of development, or was it because the model was actually more problematic?

It was both actually. The bigger the model, first of all, some things are more complicated to do when you fine-tune it, so it takes longer. Bigger models also have more capabilities you need to test.

Hopefully what you are noticing as Google DeepMind is settling down as a single org is that we release things early and ship things experimentally on to a small number of people, see what our trusted early testers are going to tell us, and then we can modify things before general release.

Speaking of safety, how are discussions with government organizations like the UK AI Safety Institute progressing?

It’s going well. I'm not sure what I'm allowed to say, as it's all kind of confidential, but of course they have access to our frontier models, and they were testing Ultra, and we continue to work closely with them. I think the US equivalent is being set up now. Those are good outcomes from the Bletchly Park AI Safety Summit. They can check things that we don’t have security clearance to check—CBRN [chemical, biological, radiological, and nuclear weapons] things.

These current systems, I don't think they are really powerful enough yet to do anything materially sort of worrying. But it's good to build that muscle up now on all sides, the government side, the industry side, and academia. And I think probably that agent systems will be the next big step change. We'll see incremental improvements along the way, and there may be some cool, big improvements, but that will feel different.
 

Vandelay

Life is absurd. Lean into it.
Joined
Apr 14, 2013
Messages
23,355
Reputation
5,744
Daps
81,286
Reppin
Phi Chi Connection

Can you fault people for the hate?

Between IG and Facebook, I'm starting to see a dramatic increase in the amount of images that are fake that creators are trying to pass off as real. Not creators creating images for artistic sake, but creators deliberately creating fake images and passing them off as real.

It's only a matter of time before people try to create things that hit the news cycle.

We have no protocols to regulate. We have no protocols to validate.

I see the potential for well-doing as a learning tool and democratization of opportunities, but we know in practice the potential for malice overrides it dramatically.

I legit either see some kind of validation tool or platform for content creators or I see people wholesale starting to reject the internet because they can't trust the content on it.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,353
Reputation
8,215
Daps
156,471


 
Top