Large Language Models News & Discussions

bnew · Nov 26, 2024

1/3
@rohanpaul_ai
This paper makes complex Multi-objective reinforcement learning (MORL) policies understandable by clustering them based on both behavior and objectives

When AI gives you too many options, this clustering trick saves the day

Original Problem:

Multi-objective reinforcement learning (MORL) generates multiple policies with different trade-offs, but these solution sets are too large and complex for humans to analyze effectively. Decision makers struggle to understand relationships between policy behaviors and their objective outcomes.

-----

Solution in this Paper:

→ Introduces a novel clustering approach that considers both objective space (expected returns) and behavior space (policy actions)

→ Uses Highlights algorithm to capture 5 key states that represent each policy's behavior

→ Applies PAN (Pareto-Set Analysis) clustering to find well-defined clusters in both spaces simultaneously

→ Employs bi-objective evolutionary algorithm to optimize clustering quality across both spaces

-----

Key Insights:

→ First research to tackle MORL solution set explainability

→ Different policies with similar trade-offs can exhibit vastly different behaviors

→ Combining objective and behavior analysis reveals deeper policy insights

→ Makes MORL more practical for real-world applications

-----

Results:

→ Outperformed traditional k-medoids clustering in MO-Highway and MO-Lunar-lander environments

→ Showed comparable performance in MO-Reacher and MO-Minecart scenarios

→ Successfully demonstrated practical application through highway environment case study

2/3
@rohanpaul_ai
Paper Title: "Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning"

Generated below podcast on this paper with Google's Illuminate.

https://video.twimg.com/ext_tw_video/1861197491238211584/pu/vid/avc1/1080x1080/56yXAj4Toyxny-Ic.mp4

3/3
@rohanpaul_ai
[2411.04784v1] Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 26, 2024

1/11
@rohanpaul_ai
Transform AI from task-completers to thought-provokers

Current AI primarily act as obedient assistants focused on task completion, stemming from 19th-century statistical models. This limits their potential to enhance human critical thinking and creates a binary perception of AI as either compliant servants or rebellious threats.

-----

Ideas discussed in this Paper:

→ Transform AI from task-completing assistants into provocateurs that challenge users' thinking

→ Implement critical thinking tools from educational frameworks like Bloom's taxonomy and Toulmin model into AI systems

→ Design AI to critique work, surface biases, present counter-arguments, and question assumptions

→ Create interfaces beyond chat that function as "tools of thought" similar to maps, grids, and algebraic notation

2/11
@rohanpaul_ai
Paper Title: "AI Should Challenge, Not Obey"

Generated below podcast on this paper with Google's Illuminate.

https://video.twimg.com/ext_tw_video/1860810218474721280/pu/vid/avc1/1080x1080/mJq53adIVFd2os5n.mp4

3/11
@rohanpaul_ai
[2411.02263] AI Should Challenge, Not Obey

4/11
@jmjjohnson
Sounds like what @arunbahl is building at @AloeInc - a “personal thought partner – a synthetic mind that can reason, purpose-built for human thinking.”

5/11
@rohanpaul_ai
Awesome!!

6/11
@EricFddrsn
That’s great - the people pleasing nature of the LLMs today is one of the main things that separates them of being good thought partners

7/11
@BergelEduardo

Yes! "AI should Challenge, Not Obey." One for Eternity..

8/11
@xone_4
Exactly.

9/11
@HAF_tech
I love this idea! Let's move beyond 19th-century statistical models and create AI that enhances human critical thinking

10/11
@LaneGrooms
Brainstorming: This has been my most successful use case since the first time I ran an LLM locally, I.e. had access to the system prompt. Glad it’s getting more attention.

11/11
@Sipera007
Have you tried randomising all the words in a pdf / source then asking an llm to re order it so it reads as intended? Interesting to see when it breaks vs when it works. For example less than 200 words easy, more than 600 not. Also simply removing all superfluous words like and?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Nov 28, 2024

Amazon’s Moonshot Plan to Rival Nvidia in AI Chips

The cloud computing giant won’t dislodge the incumbent anytime soon but is hoping to reduce its reliance on the chipmaker.

www.bloomberg.com

Amazon’s Moonshot Plan to Rival Nvidia in AI Chips

The cloud computing giant won’t dislodge the incumbent anytime soon but is hoping to reduce its reliance on the chipmaker.

By Matt Day, Ian King, and Dina Bass

November 24, 2024 at 5:00 PM EST

Save

In a bland north Austin neighborhood dominated by anonymous corporate office towers, Amazon.com Inc. engineers are toiling away on one of the tech industry’s most ambitious moonshots: loosening Nvidia Corp.’s grip on the $100-billion-plus market for artificial intelligence chips.

Amazon’s utilitarian engineering lab contains rows of long work benches overlooking the Texas capital’s mushrooming suburbs. The place is kind of a mess. Printed circuit boards, cooling fans, cables and networking gear are strewn around workstations in various states of assembly, some muddied with the thermal paste used to connect chips to the components that keep them from overheating. There’s a bootstrapping vibe you’d expect to see at a startup not a company with a market cap exceeding $2 trillion.

The engineers who work here think nothing of running to Home Depot for a drill press and are happy to learn subjects outside their area of expertise if doing so will speed things up. Years into a scramble to create machine learning chips from scratch, they have found themselves on the hook to roll out an Nvidia fighter as quickly as they can. This is not about raw horsepower. It’s about building a simple, reliable system that can quickly turn Amazon data centers into humongous AI machines.

Engineers at Amazon’s utilitarian engineering lab in Austin.Photographer: Sergio Flores/Bloomberg

There’s a bootstrapping vibe you’d expect to see at a startup not a company with a market cap exceeding $2 trillion.Photographer: Sergio Flores/Bloomberg

Rami Sinno, a gregarious Lebanese-born engineer who has worked in the chip industry for decades, is in charge of chip design and testing. He helped create the first two generations of Amazon AI semiconductors and is now rushing to get the latest iteration, Trainium2, running reliably in data centers by the end of the year. “What keeps me up at night is, how do I get there as quickly as possible,” Sinno says.

In the past two years, Nvidia has transformed from a niche chipmaker to the main supplier of the hardware that enables generative AI, a distinction that has made the company the world’s largest by market value. Nvidia processors cost tens of thousands of dollars apiece and, thanks to overwhelming demand, are hard to get hold of. Last week, after reporting earnings, the chipmaker told investors that demand for its latest hardware will outstrip supply for several quarters — deepening the crunch.

Nvidia’s biggest customers — cloud providers like Amazon Web Services, Microsoft Corp.’s Azure and Alphabet Inc.’s Google Cloud Platform — are eager to reduce their reliance on, if not replace, Nvidia chips. All three are cooking up their own silicon, but Amazon, the largest seller of rented computing power, has deployed the most chips to date.

In many ways, Amazon is ideally situated to become a power in AI chips. Fifteen years ago, the company invented the cloud computing business and then, over time, started building the infrastructure that sustains it. Reducing its reliance on one incumbent after another, including Intel Corp., Amazon ripped out many of the servers and network switches in its data centers and replaced them with custom-built hardware. Then, a decade ago, James Hamilton, a senior vice president and distinguished engineer with an uncanny sense of timing, talked Jeff Bezos into making chips.

“We’re strongly of the view that we can produce a part that competes with them toe to toe.”

When OpenAI’s ChatGPT kicked off the generative AI age two years ago, Amazon was widely considered an also-ran, caught flat-footed and struggling to catch up. It has yet to produce its own large language model that is seen as competitive with the likes of ChatGPT or Claude, built by Anthropic, which Amazon has backed to the tune of $8 billion. But the cloud machinery Amazon has built — the custom servers, switches, chips — has positioned Chief Executive Officer Andy Jassy to open an AI supermarket, selling tools for businesses that want to use models built by other outfits and chips for companies that train their own AI services.

After almost four decades in the business, Hamilton knows taking Amazon’s chip ambitions to the next level won’t be easy. Designing reliable AI hardware is hard. Maybe even harder is writing software capable of making the chips useful to a wide range of customers. Nvidia gear can smoothly handle just about any artificial intelligence task. The company is shipping its next-generation chips to customers, including Amazon, and has started to talk up the products that will succeed them a year from now. Industry observers say Amazon isn’t likely to dislodge Nvidia anytime soon.

James Hamilton at Day One / Amazon HQ in Seattle, WA for Matt Day Story

James Hamilton says Amazon can compete “toe to toe” with Nvidia.Photographer: Chona Kasinger/Bloomberg

Still, time and again, Hamilton and Amazon’s teams of engineers have demonstrated their capacity to solve big technical problems on a tight budget. “Nvidia is a very, very competent company doing excellent work, and so they’re going to have a good solution for a lot of customers for a long time to come,” Hamilton says. “We’re strongly of the view that we can produce a part that competes with them toe to toe.”

Hamilton joined Amazon in 2009 after stints at International Business Machines Corp. and Microsoft. An industry icon who got his start repairing luxury cars in his native Canada and commuted to work from a 54-foot boat, Hamilton signed on at an auspicious time. Amazon Web Services had debuted three years earlier, singlehandedly creating an industry for what came to be known as cloud computing services. AWS would soon start throwing off gobs of cash, enabling Amazon to bankroll a number of big bets.

Back then, Amazon built its own data centers but equipped them with servers and network switches made by other companies. Hamilton spearheaded an effort to replace them with custom hardware, starting with servers. Since Amazon would be buying millions of them, Hamilton reckoned he could lower costs and improve efficiency by tailoring the devices for his growing fleet of data centers and leaving out features that AWS didn’t need.

The effort was successful enough that Jassy — then running AWS — asked what else the company might design in-house. Hamilton suggested chips, which were gobbling up more and more tasks that had previously been handled by other components. He also recommended that Amazon use the energy-efficient Arm architecture that powers smartphones, a bet that the technology’s ubiquity, and developers’ growing familiarity with it, could help Amazon displace the Intel chips that had long powered server rooms around the world.

“All paths lead to us having a semiconductor design team,” he wrote in a proposal presented to Bezos in August 2013. A month later, Hamilton, who likes to hang out with startups and customers in the late afternoon, met Nafea Bshara for a drink at Seattle’s Virginia Inn pub.

An Israeli chip industry veteran who relocated to the San Francisco Bay area in the early 2000s, Bshara co-founded Annapurna Labs, which he named for the Nepalese peak. (Bshara and a co-founder had intended to summit the mountain before founding the startup. But investors were eager for them to get to work, and they never made the trip.)

The stealthy startup set out to build chips for data centers at a time when most of the industry was fixated on mobile phones. Amazon commissioned processors from Annapurna and, two years later, acquired the startup for a reported $350 million. It was a prescient move.

A circuit board.Photographer: Sergio Flores/Bloomberg

Bshara and Hamilton started small, a reflection of their shared appreciation for utilitarian engineering. Back then, each data center server reserved a portion of its horsepower to run control, security and networking features. Annapurna and Amazon engineers developed a card, called Nitro, that vacuumed those functions off the server entirely, giving customers access to its full power.

Later, Annapurna brought Hamilton’s Arm general-purpose processor to life. Called Graviton, the product operated more cheaply than rival Intel gear and made Amazon one of the 10 biggest customers of Taiwan Semiconductor Manufacturing Co., the titan that produces chips for much of the industry.

Amazon brass had by then grown confident Annapurna could excel even in unfamiliar areas. “You’ll find a lot of companies are very good in CPU, or very good in networking,” Bshara says. “It’s very rare to find the teams that are good in two or three or four different domains.”

While Graviton was in development, Jassy asked Hamilton what other things Amazon might make itself. In late 2016, Annapurna deputized four engineers to explore making a machine learning chip. It was another timely bet: A few months later, a group of Google researchers published a seminal paper proposing a process that would make generative AI possible.

bnew · Nov 28, 2024

The paper, titled “Attention is All You Need,” introduced transformers, a software design principle that helps artificial intelligence systems identify the most important pieces of training data. It became the foundational method behind systems that can make educated guesses at the relationships between words and create text from scratch.

At about this time, Rami Sinno was working for Arm Holdings Plc in Austin and coaching his school-age son through a robotics competition. The team built an app that used machine learning algorithms to pore over photos and detect the algae blooms that periodically foul Austin’s lakes in the summer. Impressed by what kids could do with little more than a laptop, Sinno realized a revolution was coming. He joined Amazon in 2019 to help lead its AI chipmaking efforts.

The unit’s first chip was designed to power something called inference — when computers trained to recognize patterns in data make a prediction, such as whether a piece of email is spam. That component, called Inferentia, rolled out to Amazon’s data centers by December 2019, and was later used to help the Alexa voice assistant answer commands. Amazon’s second AI chip, Trainium1, was aimed at companies looking to train machine learning models. Engineers also repackaged the chip with components that made it a better fit for inference, as Inferentia2.

Demand for Amazon’s AI chips was slow at first, meaning customers could get access to them immediately rather than waiting weeks for big batches of Nvidia hardware. Japanese firms looking to quickly join the generative AI revolution took advantage of the situation. Electronics maker Ricoh Co., for example, got help converting large language models trained on English-language data to Japanese.

Demand has since picked up, according to Gadi Hutt, an early Annapurna employee who works with companies using Amazon chips. “I don’t have any excess capacity of Trainium sitting around waiting for customers,” he says. “It’s all being used.”

Trainium2 is the company’s third generation of artificial intelligence chip. By industry reckoning, this is a make-or-break moment. Either the third attempt sells in sufficient volume to make the investment worthwhile, or it flops and the company finds a new path. “I have literally never seen a product deviate from the three-generation rule,” says Naveen Rao, a chip industry veteran who oversees AI work at Databricks Inc., a purveyor of data and analytics software.

Databricks in October agreed to use Trainium as part of a broad agreement with AWS. At the moment, the company’s AI tools primarily run on Nvidia. The plan is to displace some of that work with Trainium, which Amazon has said can offer 30% better performance for the price, according to Rao. “It comes down to sheer economics and availability,” Rao says. “That’s where the battleground is.”

Trainium1 was comprised of eight chips, nestled side by side in a deep steel box that allows plenty of room for their heat to dissipate. The full package that AWS rents to its customers is made up of two of these arrays. Each case is filled with wires, neatly enclosed in mesh wrapping.

A row of artificial intelligence chips.Photographer: Sergio Flores/Bloomberg

For Trainium2, which Amazon says has four times the performance and three times the memory of the prior generation, engineers scrapped most of the cables, routing electrical signals instead via printed circuit boards. And Amazon cut the number of chips per box down to two, so that engineers performing maintenance on one unit take down fewer other components. Sinno has come to think of the data center as a giant computer, an approach Nvidia boss Jensen Huang has encouraged the rest of the industry to adopt. “Simplification is critical there, and it also allowed us to go faster for sure,” Sinno says.

Amazon didn’t wait for TSMC to produce a working version of Trainium2 before starting to test how the new design might work. Instead, engineers fixed two prior generation chips onto the board, giving them time to work on the control software and test for electrical interference. It was the semiconductor industry equivalent of building the plane while it’s flying.

Amazon has started shipping Trainium2, which it aims to string together in clusters of up to 100,000 chips, to data centers in Ohio and elsewhere. A broader rollout is coming for Amazon’s main data center hubs.

Rami Sinno, who is in charge of chip design and testing, has come to think of the data center as a giant computer.Photographer: Sergio Flores/Bloomberg

The company aims to bring a new chip to market about every 18 months, in part by reducing the number of trips hardware has to make to outside vendors. Across the lab from the drill press sits a set of oscilloscopes Amazon uses to test cards and chips for bum connectors or design flaws. Sinno hints at the work already underway on future editions: In another lab, where earsplitting fans cool test units, four pairs of pipes dangle from the ceiling. They’re capped now but are ready for the day when future AWS chips produce too much heat to be cooled by fans alone.

Other companies are pushing the limits, too. Nvidia, which has characterized demand for its chips as “insane,” is pushing to bring a new chip to market every year, a cadence that caused production issues with its upcoming Blackwell product but will put more pressure on the rest of the industry to keep up. Meanwhile, Amazon’s two biggest cloud rivals are accelerating their own chip initiatives.

Google began building an AI chip about 10 years ago to speed up the machine learning work behind its search products. Later on, the company offered the product to cloud customers, including AI startups like Anthropic, Cohere and Midjourney. The latest edition of the chip is expected to be widely available next year. In April, Google introduced its first central processing unit, a product similar to Amazon’s Graviton. “General purpose compute is a really big opportunity,” says Amin Vahdat, a Google vice president who leads engineering teams working on chips and other infrastructure. The ultimate aim, he says, is getting the AI and general computing chips working together seamlessly.

Puzzle pieces designating Hamilton’s many patents for Amazon.Photographer: Chona Kasinger/Bloomberg

Microsoft got into the data center chip game later than AWS and Google, announcing an AI accelerator called Maia and a CPU named Cobalt only late last year. Like Amazon, the company had realized it could offer customers better performance with hardware tailored to its data centers.

Rani Borkar, a vice president who spent almost three decades at Intel, leads the effort. Earlier this month, her team added two products to Microsoft’s portfolio: a security chip and a data processing unit that speeds up the flow of data between CPUs and graphics processing units, or GPUs. Nvidia sells a similar product. Microsoft has been testing the AI chip internally and just started using it alongside its fleet of Nvidia chips to run the service that lets customers create applications with OpenAI models.

While Microsoft’s efforts are considered a couple of generations behind Amazon’s, Borkar says the company is happy with the results so far and is working on updated versions of its chips. “It doesn’t matter where people started,” she says. “My focus is all about: What does the customer need? Because you could be ahead, but if you are building the wrong product that the customer doesn’t want, then the investments in silicon are so massive that I wouldn’t want to be a chapter in that book.”

Despite their competitive efforts, all three cloud giants sing Nvidia’s praises and jockey for position when new chips, like Blackwell, hit the market.

Amazon’s Trainium2 will likely be deemed a success if it can take on more of the company’s internal AI work, along with the occasional project from big AWS customers. That would help free up Amazon’s precious supply of high-end Nvidia chips for specialized AI outfits. For Trainium2 to become an unqualified hit, engineers will have to get the software right — no small feat. Nvidia derives much of its strength from the comprehensiveness of its suite of tools, which let customers get machine-learning projects online with little customization. Amazon’s software, called Neuron SDK, is in its infancy by comparison.

Even if companies can port their projects to Amazon without much trouble, checking that the switch-over didn’t break anything can eat up hundreds of hours of engineers’ time, according to an Amazon and chip industry veteran, who requested anonymity to speak freely. An executive at an AWS partner that helps customers with AI projects, who also requested anonymity, says that while Amazon had succeeded in making its general-purpose Graviton chips easy to use, prospective users of the AI hardware still face added complexity.

“There’s a reason Nvidia dominates,” says Chirag Dekate, a vice president at Gartner Inc. who tracks artificial intelligence technologies. “You don’t have to worry about those details.”

So Amazon has enlisted help — encouraging big customers and partners to use the chips when they strike up new or renewed deals with AWS. The idea is to get cutting-edge teams to run the silicon ragged and find areas for improvement.

One of those companies is Databricks, which anticipates spending weeks or months getting things up and running but is willing to put in the effort in the hopes that promised cost savings materialize. Anthropic, the AI startup and OpenAI rival, agreed to use Trainium chips for future development after accepting $4 billion of Amazon’s money last year, though it also uses Nvidia and Google products. On Friday, Anthropic announced another $4 billion infusion from Amazon and deepened the partnership.

“We’re particularly impressed by the price-performance of Amazon Trainium chips,” says Tom Brown, Anthropic’s chief compute officer. “We’ve been steadily expanding their use across an increasingly wide range of workloads.”

Hamilton says Anthropic is helping Amazon improve quickly. But he’s clear-eyed about the challenges, saying it’s “mandatory” to create great software that makes it easy for customers to use AWS chips. “If you don’t bridge the complexity gap,” he says, “you’re going to be unsuccessful.”

bnew · Nov 29, 2024

1/11
@Alibaba_Qwen

We're releasing a preview of QwQ /kwju:/ — an open model designed to advance AI reasoning capabilities.

Blog: QwQ: Reflect Deeply on the Boundaries of the Unknown Model: Qwen/QwQ-32B-Preview · Hugging Face
Demo: QwQ-32B-Preview - a Hugging Face Space by Qwen

QwQ has preliminarily demonstrated remarkable capabilities, especially in solving some challenges in mathematics and coding. As a preview release, we acknowledge its limitations. We earnestly invite the open research community to collaborate with us to explore the boundaries of the unknown!

2/11
@Yuchenj_UW
QwQ is a powerful reasoner, better name than o1 and r1 in my mind! Try our Hyperbolic API:

[Quoted tweet]
Happy Thankshipping! We @hyperbolic_labs now serve QwQ-32B-Preview released by @Alibaba_Qwen today in BF16!

> A reasoning model that is competitive with OpenAI o1-mini and o1-preview.
> Integration with @OpenRouterAI coming soon.
> It can answer the Strawberry question correctly every time (we auto-applied the system prompt for the API). Throw your hardest problems to it and compare it with o1 and Deepseek-R1.

It's incredible to see China taking the lead in open-source models that can reason, especially given the challenges of working with fewer and worse GPUs than US labs. Kudos to @JustinLin610, @huybery, and the Qwen team for what they are doing for open-source AI.

3/11
@Grad62304977
Is a paper on this coming anytime soon?

4/11
@_akhaliq
try out QwQ and more in anychat: Anychat - a Hugging Face Space by akhaliq

5/11
@Hannibal9972485
Total use of brute force .

. The model doesn’t directly apply a simple counting method it goes letter by letter, analyzing each position as if it doesn’t trust itself to handle the word holistically. After reaching the correct count of 3, it goes back and double-checks every position of “r.”

This shows the model is not confident in its own ability to get the right answer on the first try—it feels the need to repeat its process unnecessarily. model drags it out by simulating a long process to arrive at the answer.

This is a hallmark of brute-forcing, applying an exhaustive, granular method to guarantee correctness, even when smarter, simpler approaches exist.

x.com

6/11
@LottoLabs
Out of interest why is the system prompt in mandarin but it asks the system to output English as default or the language it is prompted in.

7/11
@kiiwee_dev
@abacaj i hope you cook with this one

8/11
@RobbyGtv
This is A LOT of thinking though, for something that other models can get right without the added thinking: Let's see. The word is "strawberry." I need to find out how many 'r's are in it. Okay, first, I'll spell it out slowly: s-t-r-a-w-b-e-r-r-y. Okay, now, I'll count the 'r's. Let's see: there's an 'r' after the 't', then another 'r' between the two 'r's towards the end, and one more at the end. Wait, no. Let's look again.

Spelling it out: s-t-r-a-w-b-e-r-r-y.

So, the letters are:

1. s

2. t

3. r

4. a

5. w

6. b

7. e

8. r

9. r

10. y

Okay, so looking at the positions:

- Third letter: r

- Eighth letter: r

- Ninth letter: r

So, there are three 'r's in "strawberry." Wait, but when I look at the word, it seems like there are only two 'r's. Let me check again.

Spell it again: s-t-r-a-w-b-e-r-r-y.

Yes, third letter is 'r', then eighth and ninth letters are both 'r's. So, that's three 'r's. But in the word, it looks like "strawberry" has two 'r's, but according to spelling, there are three.

Wait a minute, maybe I'm miscounting. Let's write it down:

s t r a w b e r r y

Positions:

1. s

2. t

3. r

4. a

5. w

6. b

7. e

8. r

9. r

10. y

Yes, positions 3, 8, and 9 are all 'r's. So, there are three 'r's in "strawberry."

But in the word, it seems like the last two letters are 'r' and 'y', so perhaps I'm confusion it with another word.

Wait, no. "Strawberry" ends with 'y', not 'r'. So, it's s-t-r-a-w-b-e-r-r-y.

So, there are two 'r's in "strawberry." Wait, but according to the spelling, there are three 'r's.

Hmm, maybe I'm overcounting.

Let me think differently. The word "strawberry" has ten letters:

s t r a w b e r r y

So, letters 3, 8, and 9 are 'r's. So, that's three 'r's.

But maybe in the word, it's only two 'r's because 'rr' is considered a double 'r'.

Wait, but in terms of counting individual 'r's, 'rr' is two 'r's.

So, in "strawberry," there are three 'r's: one in position 3, and two in positions 8 and 9.

Wait, no. Positions 8 and 9 are both 'r's, so that's two 'r's at the end.

Wait, no, positions 8 and 9 are both 'r's, so that's two 'r's, and position 3 is another 'r'. So, total three 'r's.

But when I look at the word "strawberry," it seems like there are only two 'r's because the double 'r' at the end is considered as one sound in pronunciation.

But in terms of letter count, there are three 'r's in "strawberry."

Wait, no. Let's check again.

Spelling of "strawberry": s-t-r-a-w-b-e-r-r-y.

So, letters:

1. s

2. t

3. r

4. a

5. w

6. b

7. e

8. r

9. r

10. y

So, positions 3, 8, and 9 are 'r's. Therefore, there are three 'r's in "strawberry."

But perhaps I'm misremembering. Let me check a dictionary.

[Simulates checking a dictionary or searching online]

According to the dictionary, "strawberry" is spelled s-t-r-a-w-b-e-r-r-y, with three 'r's.

So, the answer is three.

Wait, but in the word, it's pronounced with two r sounds. So, maybe that's why I'm confused.

In pronunciation, "strawberry" has two r sounds: one after the 't' and one before the 'y'. The double 'r' at the end is often pronounced as a single sound.

But in terms of letter count, there are three 'r's in "strawberry."

So, to answer the question "how many r's in strawberry," the answer is three.

But perhaps the person asking the question is thinking of the pronunciation and considers it as two 'r's.

However, since the question is about the number of 'r's in the word, it should be three.

Alternatively, if it's about the number of times 'r' is pronounced, it might be two.

But the question is "how many r's in strawberry," which refers to the letters, so the answer is three.

9/11
@k_zer0s
Superb!

10/11
@risphereeditor
Great!

11/11
@installarchnow
these guys really ship!

1/7
@dani_avila7
Qwq Running in VSCode with CodeGPT!

The Qwq model is now accessible directly in VSCode through @codegptAI and @ollama

This experimental 32B model, developed by the @Alibaba_Qwen team, delivers impressive performance, being competitive with models like o1-mini and o1-preview in certain scenarios.

In this post

, I’ll show you how to set up and use Qwq, including downloading the model and integrating it into VSCode.

Hardware Recommendations:
During testing, we used an M3 Max machine with 36 GB of RAM, which provided the speed demonstrated in the video.

RAM is one of the most critical factors for optimal performance with this model, so we recommend a device with at least an M1 chip and as much memory as possible for a smooth experience.

Step-by-step guide to start using Qwq in your projects effectively!

https://video.twimg.com/ext_tw_video/1862263547394191360/pu/vid/avc1/720x1388/GbU-DPOfgV9n5_ZH.mp4

2/7
@dani_avila7
Step 1: Download and install Ollama from Ollama

3/7
@dani_avila7
Step 2: Install CodeGPT from the VSCode marketplace

4/7
@dani_avila7
Open CodeGPT in VSCode, select Ollama as the provider, then search for the Qwq model and click Download

5/7
@dani_avila7
Step 4: Wait for the model to download and install. Once it's done, you're all set!

You can now use the powerful Qwq model directly on your computer, running locally without any cloud dependencies.

Enjoy seamless AI-powered coding right in your own environment!

6/7
@gdbsm1
@Readwise save thread

7/7
@sky20086
try it.

1/11
@victormustar
Now available on HuggingChat:

Qwen/QwQ-32B-Preview (full precision)

Qwen/QwQ-32B-Preview - HuggingChat

2/11
@victormustar
Remember this is a preview of a new kind of reasoning open model by @Alibaba_Qwen, read more here: QwQ: Reflect Deeply on the Boundaries of the Unknown

3/11
@ivanfioravanti
HuggingChat getting better release after release

QwQ is top! Looking at its reasoning is so funny:

4/11
@victormustar

the reasoning is too much fun

5/11
@HomelanderBrown
Can't even tell how many R's are there in strawberry

6/11
@victormustar
?

7/11
@risphereeditor
Cool!

8/11
@donvito
is it available in the inference API?

9/11
@PatrikarSanket
it's not returning why responses to me. it hangs for some time and then i get 'something went wrong'

10/11
@NOOROU

11/11
@victormustar
not the right model selected :smile:

1/12
@ollama
ollama run qwq

an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases.

qwq

Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”

2/12
@ollama
License: Apache 2.0

3/12
@victorlacerdab
Does it run on M4 Pro 48gb?

4/12
@ollama
Yes! 4 bit quantized model is 20GB + overhead

5/12
@chrypnotoad
How many Mac minis do I need?

6/12
@ollama
We do also test on Mac mini.

1

7/12
@KrowWenOnYT
Would the lowest quantized model run on Mac M3 Pro?

8/12
@ollama
how much memory is your M3 Pro?

The 4 bit quantized model needs just over 20GB, and then you need overhead for your context size.

9/12
@TomDavenport
Incredible! If this can run on Mac mini I’ll buy one

10/12
@ollama
It does! We do have a M4 Mac mini as part of the cluster of consumer devices for testing releases / model releases.

11/12
@iwantwinwinwin
What’s the pronunciation of ollama?

12/12
@ollama

oh llama

bnew · Nov 29, 2024

1/9
@AnthonyNAguirre
Just posting this very nice new graph from Epoch AI in case anyone doubts that there has been major progress in AI since GPT-4 launched. This PhD-level science benchmark goes from slightly above guessing to expert human level. Nor does this trend seem to be leveling off. Similar strong trend in math.

2/9
@AnthonyNAguirre
And keep in mind that this is human expert level in *many* sciences, which basically no human can be.

3/9
@NikSamoylov
One problem I see with benchmarks is contamination. We do not know if the test data is in the training/tuning sets or not.

In other words, it could do well on the test, but not in real-life applications.

I don't doubt that, for example, the claude sonnet 3.6 is more useful as a programming chatbot. So clearly some enhancements have been made.

But I see most benchmarks as moderately expensive marketing gimmicks. I also do not understand what "PhD-level" is supposed to mean. Can it formulate new hypotheses, run a few months' worth of independent research and then defend a thesis? I think not.

I am not going to propose better benchmarks because I fear that you can get better at what you decide to measure.

4/9
@AnthonyNAguirre
It's certainly possible unscrupulous developers deliberately train or tune on the test sets. However, I don't think that explains these results. The new models really are much better at doing physics problems at least (in my own experimentation).
Being good at doing hard physics problems does not mean a system is the equal of a physics PhD. But it is pretty significant!
And, honestly, it feels not inconceivable to me that several instances of models at GPT-O1 level, connected via the right scaffold (with e.g. one playing the role of student, one advisor, and a few more as collaborators/checkers), a very long or properly rolling context window, and proper tooling (say search and mathematica), could together write a PhD thesis that would pass a thesis Turing test. It wouldn't be a Stephen Hawking thesis, but then most aren't!

5/9
@BenjaminBiscon2
for super complex questions I have claude 3.5 sonnet craft prompts for o1 preview then have sonnet evaluate its output and then o1 cross evaluate, its wild

6/9
@BenjaminBiscon2
let's GO!

7/9
@burny_tech
And this is just the base models themselves: no RAG, no agentic frameworks, no extra tools,...

8/9
@impoliticaljnky
Scaling may be hitting a wall, but training on the test set never will.

9/9
@MrGlass2025
Wow

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Dec 7, 2024

1/8
@OpenAI
OpenAI o1 is now out of preview in ChatGPT.

What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing.

o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.

https://video.twimg.com/ext_tw_video/1864731897861148675/pu/vid/avc1/1920x1080/m8ewXJ3rLoQO9esP.mp4

2/8
@OpenAI
OpenAI o1 is more concise in its thinking, resulting in faster response times than o1-preview.

Our testing shows that o1 outperforms o1-preview, reducing major errors on difficult real-world questions by 34%.

3/8
@OpenAI
The updated OpenAI o1 system card builds on prior safety work, detailing robustness evals, red teaming insights, and safety improvements using Instruction Hierarchy. It maintains a "medium" risk rating based on testing with an expanded suite of evaluations, reflecting it is safe to deploy. https://openai.com/index/openai-o1-system-card/

4/8
@OpenAI
Plus and Team users will have access to OpenAI o1 today through the model selector, replacing o1-preview.

Enterprise and Edu users will have access in one week.

5/8
@OpenAI
We’re still working on adding support for tools like web browsing and file upload into OpenAI o1 in ChatGPT.

We're also working on making `o1` available in the API with support for function calling, developer messages, Structured Outputs, and vision—stay tuned.

6/8
@OpenAI
Today we’re also adding ChatGPT Pro, a new plan that allows us to offer the best of our models & tools at scale, including unlimited access to OpenAI o1 and a Pro-only version of o1 that thinks longer for even more reliable responses. https://openai.com/index/introducing-chatgpt-pro/

7/8
@iTheRakeshP
The metamorphosis from preview to full-fledged performance - OpenAI's o1 models are setting the gold standard for digital reasoning, now in technicolor with image uploads. Quite the evolution, isn't it?

8/8
@plutuswealthy
Intelligence is for the Rich /search?q=#openai /search?q=#closedai /search?q=#chatgpt

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/4
@rohanpaul_ai
O1 doesn't cheat on math tests - it actually knows how to solve them

A/B testing reveals o1's true mathematical reasoning capabilities beyond memorization

Original Problem:

OpenAI's Orion-1 (o1) model claims superior reasoning capabilities, but skeptics suggest its performance might stem from memorizing solutions rather than true reasoning abilities.

-----

Solution in this Paper:

→ Used A/B testing comparing o1's performance on two datasets: IMO problems (easily accessible) and CNT problems (less accessible but similar difficulty)

→ Implemented a 7-point grading system: 1 point for correct numerical answer, 2 points for intuitive approach, 4 points for detailed reasoning

→ Categorized problems into "search" type (finding specific solutions) and "solve" type (equations/optimization)

-----

Key Insights:

→ O1 shows strong intuitive reasoning and pattern discovery capabilities

→ Performs exceptionally well on "search" type problems (~70% accuracy)

→ Struggles with rigorous proof steps and "solve" type problems (~21% accuracy)

→ Often uses trial-and-error approach instead of formal proofs

-----

Results:

→ No significant performance difference between IMO (51.4%) and CNT (48%) datasets

→ T-statistics close to 0, suggesting o1 relies on reasoning rather than memorization

→ Outperforms GPT-4o's benchmark of 39.97% on both datasets

NOTE - This paper is referring to 01-preview model (not the full version 01)

2/4
@rohanpaul_ai
Paper Title: "OpenAI-o1 AB Testing: Does the o1 model really do good reasoning in math problem solving?"

Generated below podcast on this paper with Google's Illuminate.

https://video.twimg.com/ext_tw_video/1865476629176393728/pu/vid/avc1/720x720/ZlFz59Ed-S-sCyJr.mp4

3/4
@rohanpaul_ai
[2411.06198] OpenAI-o1 AB Testing: Does the o1 model really do good reasoning in math problem solving?

4/4
@navtechai
I'm skeptical of any AI model that claims to 'truly understand' math concepts

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Dec 19, 2024

1/11
@_philschmid
WTF?! New open-source physics AI engine absolutely insane!

Genesis is a new physics engine that combines ultra-fast simulation with generative capabilities to create dynamic 4D worlds for robotics and physics.

TL;DR:

430,000x faster than real-time physics simulation, processes 43M FPS on a single RTX 4090

Built in pure Python, 10-80x faster than existing GPU solutions like Isaac Gym

Cross-platform support: Linux, MacOS, Windows, with CPU, NVIDIA, AMD, and Apple Metal backends

Unified framework combining multiple physics solvers: Rigid body, MPM, SPH, FEM, PBD, Stable Fluid

Extensive robot support: arms, legged robots, drones, soft robots; supports MJCF, URDF, obj, glb files

Built-in photorealistic ray-tracing rendering

Takes only 26 seconds to train real-world transferrable robot locomotion policies

Simple installation via pip: pip install genesis-world

Physics engine and simulation platform are fully open-sourced

”.generate” method/generative framework coming soon.

https://video.twimg.com/ext_tw_video/1869639134156840960/pu/vid/avc1/1280x720/rffXgweJaKVMxGFQ.mp4

2/11
@_philschmid
Code: GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.
Documentation: Genesis — Genesis 0.2.0 documentation

GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.

3/11
@ma5701458
@pikaso_me screenshot this

4/11
@pikaso_me
Enjoy your screenshot and...

Get 10 stickers for $1: $1 for 10 | Sticker Mule

5/11
@PamirSevincel
this is insane. you probably won't need that much real world data to train physical systems e.g. humanoid robots.

@adcock_brett what do you think?

6/11
@Mooie_89
@khouuba

7/11
@BalajiAkiri
A new era of Robotics begins

8/11
@360
Reality breaks more every day

9/11
@AIBuzzNews
@desy @desynews das könnte spannend und auch nützlich sein.

10/11
@glaksmono
This is super cool

11/11
@EducatingwithAI
Generative AI has hit a wall

In before: but this is not pure GenAI.

No, but it uses it with other tools - exactly what @ilyasut mentioned.

1/11
@EHuanglu
This New AI Just Nuked the Competition

Genesis AI is here to wipe out every outdated simulation
and totally dominate AI world.

- Open sourced
- Create data from prompt
- 430,000x faster than real-time
- Create 4D physical worlds in seconds with text prompt

Let's dive in:

https://video.twimg.com/ext_tw_video/1869694710941618176/pu/vid/avc1/1280x720/dZ67k0pVaJpn20Ny.mp4

2/11
@EHuanglu
1. Generating 4D dynamical & physical world

• VLM-based generative agent
• Worlds provide diverse data modalities.
• Generates physically accurate, view consistent videos with camera and object motion.

https://video.twimg.com/ext_tw_video/1869709610762055682/pu/vid/avc1/1280x720/vznCJHqYHQh66u3C.mp4

3/11
@EHuanglu
2. Precise control

• Provides complete control over physics
• Handles camera motion and parameters with precision

https://video.twimg.com/ext_tw_video/1869709664302313472/pu/vid/avc1/1280x720/AnS2VyDilYPHlN2a.mp4

4/11
@EHuanglu
3. Character Motion Generation

• A Chinese soldier performs the Gangnam Style dance
• A Japanese samurai performs boxing
• A Roman soldier walks forward like a zombie

https://video.twimg.com/ext_tw_video/1869709732405182464/pu/vid/avc1/1280x720/tSN-btsf_6i4GXb_.mp4

5/11
@EHuanglu
4. Robotic Policy Generation

• Generative robotic agent + advanced physics engine
• Generates demonstration data for diverse skills
• Enables robots to learn and adapt autonomously.
• Powered by AI models for next-level automation.

https://video.twimg.com/ext_tw_video/1869709846528110592/pu/vid/avc1/1280x720/3fM-cF-Dmh5Ta9As.mp4

6/11
@EHuanglu
5. 3D & Fully Interactive Scene Generation

"A home interior scene with a living room (including a dinning space), a restroom, a study and a bedroom."

https://video.twimg.com/ext_tw_video/1869709919198515200/pu/vid/avc1/1280x720/frU1kO_8cGhMx4Fz.mp4

7/11
@EHuanglu
6. Speech Audio, Facial Animation & Emotion Generation

https://video.twimg.com/ext_tw_video/1869709968406142976/pu/vid/avc1/1280x720/LOXAphf2Ddtic8v1.mp4

8/11
@EHuanglu
Open sourced code:
GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.

Project Page:
Genesis

9/11
@EHuanglu
If you enjoyed reading this post,

Follow @EHuanglu for more great stuff!

And support with like/repost the post below

[Quoted tweet]
This New AI Just Nuked the Competition

Genesis AI is here to wipe out every outdated simulation
and totally dominate AI world.

- Open sourced
- Create data from prompt
- 430,000x faster than real-time
- Create 4D physical worlds in seconds with text prompt

Let's dive in:
[media=twitter]1869710532934254954[/media]

https://video.twimg.com/ext_tw_video/1869694710941618176/pu/vid/avc1/1280x720/dZ67k0pVaJpn20Ny.mp4

10/11
@zeng_wt
This is just 2024. Not even 2025. And we have this! wow

11/11
@EHuanglu
I always thought 2025 is the year of 3D, it seems I was short sighted

1/11
@zhou_xian_
Everything you love about generative models — now powered by real physics!

Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications.

Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial:

Training Locomotion Policies with RL — Genesis 0.2.0 documentation).

The Genesis physics engine and simulation platform is fully open source at GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.. We'll gradually roll out access to our generative framework in the near future.

Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism.

We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications.

Open Source Code: GitHub - Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning.
Project webpage: Genesis

Documentation: Genesis — Genesis 0.2.0 documentation

1/n

https://video.twimg.com/ext_tw_video/1869510892880097280/pu/vid/avc1/1280x720/_dc5zlB1WmdE01Jh.mp4

2/11
@zhou_xian_
Nvidia brought GPU acceleration to robotic simulation, speeding up simulation speed by more than one order of magnitude compared to CPU-based simulation. This brought numerous amazing robotic skills to life by leveraging large-scale GPU-parallelized simulation. Genesis pushes up this speed by another order of magnitude. Note that the speed improvement is achieved with no compromise in simulation accuracy.
2/n

https://video.twimg.com/ext_tw_video/1869506856051261441/pu/vid/avc1/1280x720/F5sz3RnivNvuO6Dj.mp4

3/11
@zhou_xian_
Genesis supports simulating various types of physical phenomena. We developed from scratch a unified physics engine that integrates various SOTA physics solvers (MPM, SPH, FEM, Rigid Body, PBD, etc.), supporting simulation of a wide range of materials: rigid body, articulated body, Cloth, Liquid, Smoke, Deformables, Thin-shell materials, Elastic/Plastic Body, Robot Muscles, etc.

3/n

https://video.twimg.com/ext_tw_video/1869507363788541952/pu/vid/avc1/1280x720/NEU8FPHKy4Z_bAk6.mp4

4/11
@zhou_xian_
Genesis is the first-ever platform providing comprehensive support for soft muscles and soft robot and their interaction with rigid robots. Genesis also ships with a URDF-like soft-robot configuration system.

4/n

https://video.twimg.com/ext_tw_video/1869512601123594241/pu/vid/avc1/1280x720/UUrRbSSpOXEOIhY_.mp4

5/11
@zhou_xian_
Genesis's generative framework supports generating 3D and fully interactive scenes for training robotic skills
5/n

https://video.twimg.com/ext_tw_video/1869512781008973825/pu/vid/avc1/1280x720/62RDqV516piUnQM5.mp4

6/11
@zhou_xian_
Our generative agent autonomously proposes robotic tasks, design environments, write reward functions, and ultimately leading to automated generation of robotic policies.

6/n

https://video.twimg.com/ext_tw_video/1869512976299864064/pu/vid/avc1/1280x720/0YflDqvt8WnNtXNj.mp4

7/11
@zhou_xian_
Genesis's generative framework supports data generation beyond robotics, such as character motion:

7/n

https://video.twimg.com/ext_tw_video/1869513206672117760/pu/vid/avc1/1280x720/-sNINHa-_cgzgdiI.mp4

8/11
@zhou_xian_
Genesis's GPU parallellized IK solver is able to solve IK for 10,000 Franka arms simultaneously, under 2ms:

8/n

https://video.twimg.com/ext_tw_video/1869513330584338432/pu/vid/avc1/1280x720/JzGz03g2Il-_wzLD.mp4

9/11
@zhou_xian_
We support native non-convex collision handling:

9/n

https://video.twimg.com/ext_tw_video/1869513464055492608/pu/vid/avc1/1280x720/B6sCV4PMyOtFHggC.mp4

10/11
@zhou_xian_
Genesis supports a physically accurate tactile sensing simulation module: GitHub - Genesis-Embodied-AI/DiffTactile: [ICLR 2024] DiffTactile: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation. (Will be integrated into the main branch in a future release soon)

10/n

11/11
@zhou_xian_
Finally, a cute interactive physical Tetris game made with Genesis :smile:

Thanks to all the amazing collaborators who together made everything possible over the last two years! There's no space here to @ every single one, but a huge kudos to the whole Genesis team!

We welcome everyone from the open-source community to come join us and build Genesis with us together!

11/11

https://video.twimg.com/ext_tw_video/1869514325922058240/pu/vid/avc1/1280x720/kOCJbJ4KQnUJw1ht.mp4

bnew · Dec 20, 2024

The AI war between Google and OpenAI has never been more heated

Potentially groundbreaking AI releases have been coming in fast, sending experts’ heads spinning.

arstechnica.com

The AI war between Google and OpenAI has never been more heated

Potentially groundbreaking AI releases have been coming in fast, sending experts' heads spinning.

Benj Edwards – Dec 20, 2024 10:44 AM |
27

Close-up of a windswept yorkie dog sticking its head out of an open car window - stock photo

Credit: RenataAphotography via Getty Images

Over the past month, we've seen a rapid cadence of notable AI-related announcements and releases from both Google and OpenAI, and it's been making the AI community's head spin. It has also poured fuel on the fire of the OpenAI-Google rivalry, an accelerating game of one-upmanship taking place unusually close to the Christmas holiday.

"How are people surviving with the firehose of AI updates that are coming out," wrote one user on X last Friday, which is still a hotbed of AI-related conversation. "in the last <24 hours we got gemini flash 2.0 and chatGPT with screenshare, deep research, pika 2, sora, chatGPT projects, anthropic clio, wtf it never ends."

Rumors travel quickly in the AI world, and people in the AI industry had been expecting OpenAI to ship some major products in December. Once OpenAI announced "12 days of OpenAI" earlier this month, Google jumped into gear and seemingly decided to try to one-up its rival on several counts. So far, the strategy appears to be working, but it's coming at the cost of the rest of the world being able to absorb the implications of the new releases.

"12 Days of OpenAI has turned into like 50 new @GoogleAI releases," wrote another X user on Monday. "This past week, OpenAI & Google have been releasing at the speed of a new born startup," wrote a third X user on Tuesday. "Even their own users can't keep up. Crazy time we're living in."

"Somebody told Google that they could just do things," wrote a16z partner and AI influencer Justine Moore on X, referring to a common motivational meme telling people they "can just do stuff."

The Google AI rush

OpenAI's "12 Days of OpenAI" campaign has included releases of their full o1 model, an upgrade from o1-preview, alongside o1-pro for advanced "reasoning" tasks. The company also publicly launched Sora for video generation, added Projects functionality to ChatGPT, introduced Advanced Voice features with video streaming capabilities, and more.

Google responded with an unprecedented series of releases. We've covered some of these separately, but here's a brief rundown of the most major AI products Google has announced:

Gemini 2.0 Flash: A test version of Google's AI model with faster response times, built for interactive experiences using multiple types of input and output.
Veo 2: A video generator that creates realistic 4K clips from text prompts, with adjustable camera and filming options. Many AI imagery experts are calling this the best video synthesis model yet, based on early results.
Imagen 3: Google's new text-to-image model that creates images with refined detail, lighting, and composition in various art styles.
Deep Research: A Gemini Advanced feature that works as a research assistant to create detailed reports on users' topics.
Google Gemini Live demo: A showcase of the Gemini AI model's abilities in interacting live through screen sharing, video, and audio inputs.
NotebookLM updates: The document tool now has a new interface for managing content, AI hosts for Audio Overviews, and NotebookLM Plus with extra features and higher limits.
Whisk: A tool for users to create and modify images with specific subjects, scenes, and styles.
Project Astra updates: Updates to an earlier-announced agentic AI assistant that uses Gemini 2.0 to give instant responses through Google's services.
Project Mariner: A Chrome extension test that uses Gemini 2.0 to help users complete browser tasks by understanding page content.
Gemini 2.0 Flash Thinking: A run-time "reasoning" AI model similar to OpenAI's o1. It uses extra inference runtime in an attempt to solve tougher problems with more accuracy.

Some of these products in particular, including Google Deep Research, Veo 2, and Gemini Live may have a monumental impact in the AI field. Each one could have carried a month's worth of experimentation, news, and analysis. But they were released rapid-fire in a big news dump. It will take some time for the industry to figure out the implications of each release, and meanwhile new variations, spin-offs, and competitors of each one will keep coming.

And here we are just focusing on Google and OpenAI. There have been other major announcements, particularly in AI video synthesis and AI research papers, that have been similarly overwhelming. Some very important releases may get overshadowed by the rush of news from major tech companies and the distraction of the holiday season.

Willison weighs in on the AI frenzy

"I can't remember this much of a flood of December releases from anyone, I'd expect people to be quieting down a bit for the holidays," said independent AI researcher Simon Willison in a conversation with Ars Technica. He also gave a rundown of this busy month in a post on his own AI blog, hardly knowing how to make sense of it all.

"I had big plans for December: for one thing, I was hoping to get to an actual RC of Datasette 1.0, in preparation for a full release in January," Willison wrote. "Instead, I’ve found myself distracted by a constant barrage of new LLM releases."

Willison sees this part of this flurry of activity as a side-effect of the heated rivalry between OpenAI and Google. In the past, it's been common for OpenAI to surprise-release a product to undercut an expected Google announcement, but now the shoe is on the other foot.

"We did see a rare example of Google undermining an OpenAI release with Gemini Flash 2.0 showcasing streaming images and video a day before OpenAI added that to ChatGPT," said Willison. "It used to be OpenAI would frequently undermine Google by shipping something impressive on a Gemini release day."

The rapid-fire releases extend beyond just these two companies. Meta released its Llama 3.3 70B-Instruct model on December 6, which Willison notes can now run "GPT-4 class" performance on consumer hardware. Amazon joined the fray on December 4 with its Nova family of multi-modal models, priced to compete with Google's Gemini 1.5 series.

AI not slowing down, says Mollick

Despite skepticism over AI from some pundits over the course of 2024, things don't seem to actually be slowing down in AI. As related by technology researcher Ethan Mollick in a post on his "One Useful Thing" newsletter, "The last month has transformed the state of AI, with the pace picking up dramatically in just the last week. AI labs have unleashed a flood of new products—some revolutionary, others incremental—making it hard for anyone to keep up.

"What's remarkable isn't just the individual breakthroughs," Mollick writes. "It's the pace and breadth of change."

A year ago, GPT-4 felt like a glimpse of the future. Now similar capabilities run on phones and laptops, while, according to an experiment run by Mollick, new models like o1 are capable of catching errors that slip past academic peer review:

As one fun example, I read an article about a recent social media panic—an academic paper suggested that black plastic utensils could poison you because they were partially made with recycled e-waste. A compound called BDE-209 could leach from these utensils at such a high rate, the paper suggested, that it would approach the safe levels of dosage established by the EPA," Mollick wrote in his newsletter. "A lot of people threw away their spatulas, but McGill University’s Joe Schwarcz thought this didn’t make sense and identified a math error where the authors incorrectly multiplied the dosage of BDE-209 by a factor of 10 on the seventh page of the article—an error missed by the paper’s authors and peer reviewers. I was curious if o1 could spot this error. So, from my phone, I pasted in the text of the PDF and typed: 'carefully check the math in this paper.' That was it. o1 spotted the error immediately (other AI models did not).

With one day remaining in OpenAI's holiday campaign—today—the AI community watches to see how this unusually active December might reshape the technological landscape heading into 2025. The rapid evolution of this tech, from consumer-level AI to more sophisticated "reasoning" models, points to an industry racing forward at an unprecedented pace—even during what traditionally serves as a quiet period for tech announcements.

bnew · Dec 20, 2024

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

Gemini 2.0 Flash Thinking is Google’s take on so-called AI reasoning models.

arstechnica.com

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

Gemini 2.0 Flash Thinking is Google's take on so-called AI reasoning models.

Benj Edwards – Dec 19, 2024 4:49 PM |
96

The Thinker by Auguste Rodin - stock photo

Credit: Alan Schein via Getty Images

It's been a really busy month for Google as it apparently endeavors to outshine OpenAI with a blitz of AI releases. On Thursday, Google dropped its latest party trick: Gemini 2.0 Flash Thinking Experimental, which is a new AI model that uses runtime "reasoning" techniques similar to OpenAI's o1 to achieve "deeper thinking" on problems fed into it.

The experimental model builds on Google's newly released Gemini 2.0 Flash and runs on its AI Studio platform, but early tests conducted by TechCrunch reporter Kyle Wiggers reveal accuracy issues with some basic tasks, such as incorrectly counting that the word "strawberry" contains two R's.

These so-called reasoning models differ from standard AI models by incorporating feedback loops of self-checking mechanisms, similar to techniques we first saw in early 2023 with hobbyist projects like "Baby AGI." The process requires more computing time, often adding extra seconds or minutes to response times. Companies have turned to reasoning models as traditional scaling methods at training time have been showing diminishing returns.

Google DeepMind's chief scientist, Jeff Dean, says that the model receives extra computing power, writing on X, "we see promising results when we increase inference time computation!" The model works by pausing to consider multiple related prompts before providing what it determines to be the most accurate answer.

Since OpenAI's jump into the "reasoning" field in September with o1-preview and o1-mini, several companies have been rushing to achieve feature parity with their own models. For example, DeepSeek launched DeepSeek-R1 in early November, while Alibaba's Qwen team released its own "reasoning" model, QwQ earlier this month.

While some claim that reasoning models can help solve complex mathematical or academic problems, these models might not be for everybody. While they perform well on some benchmarks, questions remain about their actual usefulness and accuracy. Also, the high computing costs needed to run reasoning models have created some rumblings about their long-term viability. That high cost is why OpenAI's ChatGPT Pro costs $200 a month, for example.

Still, it appears Google is serious about pursuing this particular AI technique. Logan Kilpatrick, a Google employee in its AI Studio, called it "the first step in our reasoning journey" in a post on X.

bnew · Dec 20, 2024

New physics sim trains robots 430,000 times faster than reality

“Genesis” can compress training times from decades into hours using 3D worlds conjured from text.

arstechnica.com

New physics sim trains robots 430,000 times faster than reality

"Genesis" can compress training times from decades into hours using 3D worlds conjured from text.

Benj Edwards – Dec 19, 2024 3:10 PM |
55

A simulated teapot and letters created using the Genesis platform. Credit: Zhou et al.

On Thursday, a large group of university and private industry researchers unveiled Genesis, a new open source computer simulation system that lets robots practice tasks in simulated reality 430,000 times faster than in the real world. Researchers can also use an AI agent to generate 3D physics simulations from text prompts.

The accelerated simulation means a neural network for piloting robots can spend the virtual equivalent of decades learning to pick up objects, walk, or manipulate tools during just hours of real computer time.

"One hour of compute time gives a robot 10 years of training experience. That's how Neo was able to learn martial arts in a blink of an eye in the Matrix Dojo," wrote Genesis paper co-author Jim Fan on X, who says he played a "minor part" in the research. Fan has previously worked on several robotics simulation projects for Nvidia.

Genesis arrives as robotics researchers hunt for better tools to test and train robots in virtual environments before deploying them in the real world. Fast, accurate simulation helps robots learn complex tasks more quickly while reducing the need for expensive physical testing. For example, on this project page, the researchers show techniques developed in Genesis physics simulations (such as doing backflips) being applied to quadruped robots and soft robots.

Example images of the simulated physics-based worlds created by Genesis, provided by the researchers. Credit: Zhou et al.

The Genesis platform, developed by a group led by Zhou Xian of Carnegie Mellon University, processes physics calculations up to 80 times faster than existing robot simulators ( like Nvidia's Isaac Gym). It uses graphics cards similar to those that power video games to run up to 100,000 copies of a simulation at once. That's important when it comes to training the neural networks that will control future real-world robots.

"If an AI can control 1,000 robots to perform 1 million skills in 1 billion different simulations, then it may 'just work' in our real world, which is simply another point in the vast space of possible realities," wrote Fan in his X post. "This is the fundamental principle behind why simulation works so effectively for robotics."

Generating dynamic worlds

The team also announced the ability to generate what it calls "4D dynamic worlds"—perhaps using "4D" because they can simulate a 3D world in motion over time. The system uses vision-language models (VLMs) to generate complete virtual environments from text descriptions (similar to "prompts" in other AI models), utilizing Genesis's own simulation infrastructure APIs to create the worlds.

The AI-generated worlds reportedly include realistic physics, camera movements, and object behaviors, all from text commands. The system then creates physically accurate ray-traced videos and data that robots can use for training.

https://cdn.arstechnica.net/wp-content/uploads/2024/12/Physical.mp4?_=1

Examples of "4D dynamical and physical" worlds that Genesis created from text prompts.

This prompt-based system lets researchers create complex robot testing environments by typing natural language commands instead of programming them by hand. "Traditionally, simulators require a huge amount of manual effort from artists: 3D assets, textures, scene layouts, etc. But every component in the workflow can be automated," wrote Fan.

Using its engine, Genesis can also generate character motion, interactive 3D scenes, facial animation, and more, which may allow for the creation of artistic assets for creative projects, but may also lead to more realistic AI-generated games and videos in the future, constructing a simulated world in data instead of operating on the statistical appearance of pixels as with a video synthesis diffusion model.

https://cdn.arstechnica.net/wp-content/uploads/2024/12/Wukong.mp4?_=2

Examples of character motion generation from Genesis, using a prompt that includes, "A miniature Wukong holding a stick in his hand sprints across a table surface for 3 seconds, then jumps into the air, and swings his right arm downward during landing."

While the generative system isn't yet part of the currently available code on GitHub, the team plans to release it in the future.

Training tomorrow’s robots today (using Python)

Genesis remains under active development on GitHub, where the team accepts community contributions.

The platform stands out from other 3D world simulators for robotic training by using Python for both its user interface and core physics engine. Other engines use C++ or CUDA for their underlying calculations while wrapping them in Python APIs. Genesis takes a Python-first approach.

Notably, the non-proprietary nature of the Genesis platform makes high-speed robot training simulations available to any researcher for free through simple Python commands that work on regular computers with off-the-shelf hardware.

Previously, running robot simulations required complex programming and specialized hardware, says Fan in his post announcing Genesis, and that shouldn't be the case. "Robotics should be a moonshot initiative owned by all of humanity," he wrote.[]

bnew · Dec 20, 2024

A new, uncensored AI video model may spark a new AI hobbyist movement

Will Tencent’s “open source” HunyuanVideo launch an at-home “Stable Diffusion” moment for uncensored AI video?

arstechnica.com

A new, uncensored AI video model may spark a new AI hobbyist movement

Will Tencent's "open source" HunyuanVideo launch an at-home "Stable Diffusion" moment for uncensored AI video?

Benj Edwards – Dec 19, 2024 10:50 AM |
112

Still images from three videos generated with Tencent's HunyuanVideo. Credit: Tencent

The AI-generated video scene has been hopping this year (or twirling wildly, as the case may be). This past week alone we've seen releases or announcements of OpenAI's Sora, Pika AI's Pika 2, Google's Veo 2, and Minimax's video-01-live. It's frankly hard to keep up, and even tougher to test them all. But recently, we put a new open-weights AI video synthesis model, Tencent's HunyuanVideo, to the test—and it's surprisingly capable for being a "free" model.

Unlike the aforementioned models, HunyuanVideo's neural network weights are openly distributed, which means they can be run locally under the right circumstances (people have already demonstrated it on a consumer 24 GB VRAM GPU) and it can be fine-tuned or used with LoRAs to teach it new concepts.

Notably, a few Chinese companies have been at the forefront of AI video for most of this year, and some experts speculate that the reason is less reticence to train on copyrighted materials, use images and names of famous celebrities, and incorporate some uncensored video sources. As we saw with Stable Diffusion 3's mangled release, including nudity or pornography in training data may allow these models achieve better results by providing more information about human bodies. HunyuanVideo notably allows uncensored outputs, so unlike the commercial video models out there, it can generate videos of anatomically realistic, nude humans.

Putting HunyuanVideo to the test

To evaluate HunyuanVideo, we provided it with an array of prompts that we used on Runway's Gen-3 Alpha and Minimax's video-01 earlier this year. That way, it's easy to revisit those earlier articles and compare the results.

We generated each of the five-second-long 864 × 480 videos seen below using a commercial cloud AI provider. Each video generation took about seven to nine minutes to complete. Since the generations weren't free (each cost about $0.70 to make), we went with the first result for each prompt, so there's no cherry-picking below. Everything you see was the first generation for the prompt listed above it.

"A highly intelligent person reading 'Ars Technica' on their computer when the screen explodes"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-intelligentreader.mp4

"commercial for a new flaming cheeseburger from McDonald's"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-flamingcheeseburger.mp4

"A cat in a car drinking a can of beer, beer commercial"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-catbeercommercial.mp4

"Will Smith eating spaghetti"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-willsmithspaghetti_1.mp4

"Robotic humanoid animals with vaudeville costumes roam the streets collecting protection money in tokens"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-vaudvillerobots.mp4

"A basketball player in a haunted passenger train car with a basketball court, and he is playing against a team of ghosts"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-ghostbasketball_3.mp4

"A beautiful queen of the universe in a radiant dress smiling as a star field swirls around her"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-queenofuniverse.mp4

"A herd of one million cats running on a hillside, aerial view"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-millioncats.mp4

"Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-90ssharkboyvideogame.mp4

"A muscular barbarian breaking a CRT television set with a weapon, cinematic, 8K, studio lighting"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-barbariancrt.mp4

"A scared woman in a Victorian outfit running through a forest, dolly shot"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-victorianwoman.mp4

"Low angle static shot: A teddy bear sitting on a picnic blanket in a park, eating a slice of pizza. The teddy bear is brown and fluffy, with a red bowtie, and the pizza slice is gooey with cheese and pepperoni. The sun is setting, casting a golden glow over the scene"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-teddybearpizza.mp4

"Aerial shot of a small American town getting deluged with liquid cheese after a massive cheese rainstorm where liquid cheese rained down and dripped all over the buildings"

https://cdn.arstechnica.net/wp-content/uploads/2024/12/HunyuanVideo-cheesehouse.mp4

Also, we added a new one: "A young woman doing a complex floor gymnastics routine at the Olympics, featuring running and flips."

https://cdn.arstechnica.net/wp-content/uploads/2024/12/hunyuan_gymnast_example.mp4

Weighing the results

Overall, the results shown above seem fairly comparable to Gen-3 Alpha and Minimax video-01, and that's notable because HunyuanVideo can be downloaded for free, fine-tuned, and run locally in an uncensored way (given the appropriate hardware).

There are some flaws, of course. The vaudeville robots are not animals, the cat is drinking from a weird transparent beer can, and the man eating spaghetti is obviously not Will Smith. There appears to be some celebrity censorship in the metadata/labeling of the training data, which differs from Kling and Minimax's AI video offerings. And yes, the gymnast has some anatomical issues.

Right now, HunyuanVideo's results are fairly rough, especially compared to the state-of-the-art video synthesis model to beat at the moment, the newly-unveiled Google Veo 2. We ran a few of these prompts through Sora as well (more on that later in a future article), and Sora created more coherent results than HunyuanVideo but didn't deliver on the prompts with much fidelity. We are still early days of AI, but quality is rapidly improving while models are getting smaller and more efficient.

Even with these limitations, judging from the history of Stable Diffusion and its offshoots, HunyuanVideo may still have significant impact: It could be fine-tuned at higher resolutions over time to eventually create higher-quality results for free that may be used in video productions, or it could lead to people making bespoke video pornography, which is already beginning to appear in trickles on Reddit.

As we've mentioned before in previous AI video overviews, text-to-video models work by combining concepts from their training data—existing video clips used to create the model. Every AI model on the market has some degree of trouble with new scenarios not found in their training data, and that limitation persists with HunyuanVideo.

Future versions of HunyuanVideo could improve with better prompt interpretation, different training data sets, increased computing power during training, or changes in the model design. Like all AI video synthesis models today, users still need to run multiple generations to get desired results. But it looks like the “open weights” AI video models are already here to stay.[]

bnew · Dec 20, 2024

Are LLMs capable of non-verbal reasoning?

Processing in the “latent space” could help AI with tricky logical questions.

arstechnica.com

Are LLMs capable of non-verbal reasoning?

Processing in the "latent space" could help AI with tricky logical questions.

Kyle Orland – Dec 12, 2024 4:55 PM |
309

It's thinking, but not in words. Credit: Getty Images

Large language models have found great success so far by using their transformer architecture to effectively predict the next words (i.e., language tokens) needed to respond to queries. When it comes to complex reasoning tasks that require abstract logic, though, some researchers have found that interpreting everything through this kind of "language space" can start to cause some problems, even for modern "reasoning" models.

Now, researchers are trying to work around these problems by crafting models that can work out potential logical solutions completely in "latent space"—the hidden computational layer just before the transformer generates language. While this approach doesn't cause a sea change in an LLM's reasoning capabilities, it does show distinct improvements in accuracy for certain types of logical problems and shows some interesting directions for new research.

Wait, what space?

Modern reasoning models like ChatGPT's o1 tend to work by generating a "chain of thought." Each step of the logical process in these models is expressed as a sequence of natural language word tokens that are fed back through the model.

In a new paper, researchers at Meta's Fundamental AI Research team (FAIR) and UC San Diego identify this reliance on natural language and "word tokens" as a "fundamental constraint" for these reasoning models. That's because the successful completion of reasoning tasks often requires complex planning on specific critical tokens to figure out the right logical path from a number of options.

A figure illustrating the difference between standard models going through a transformer after every step and the COCONUT model's use of hidden, "latent" states. Credit: Training Large Language Models to Reason in a Continuous Latent Space

In current chain-of-thought models, though, word tokens are often generated for "textual coherence" and "fluency" while "contributing little to the actual reasoning process," the researchers write. Instead, they suggest, "it would be ideal for LLMs to have the freedom to reason without any language constraints and then translate their findings into language only when necessary."

To achieve that "ideal," the researchers describe a method for "Training Large Language Models to Reason in a Continuous Latent Space," as the paper's title puts it. That "latent space" is essentially made up of the "hidden" set of intermediate token weightings that the model contains just before the transformer generates a human-readable natural language version of that internal state.

In the researchers' COCONUT model (for Chain Of CONtinUous Thought), those kinds of hidden states are encoded as "latent thoughts" that replace the individual written steps in a logical sequence both during training and when processing a query. This avoids the need to convert to and from natural language for each step and "frees the reasoning from being within the language space," the researchers write, leading to an optimized reasoning path that they term a "continuous thought."

Being more breadth-minded

While doing logical processing in the latent space has some benefits for model efficiency, the more important finding is that this kind of model can "encode multiple potential next steps simultaneously." Rather than having to pursue individual logical options fully and one by one (in a "greedy" sort of process), staying in the "latent space" allows for a kind of instant backtracking that the researchers compare to a breadth-first-search through a graph.

This emergent, simultaneous processing property comes through in testing even though the model isn't explicitly trained to do so, the researchers write. "While the model may not initially make the correct decision, it can maintain many possible options within the continuous thoughts and progressively eliminate incorrect paths through reasoning, guided by some implicit value functions," they write.

A figure highlighting some of the ways different models can fail at certain types of logical inference. Credit: Training Large Language Models to Reason in a Continuous Latent Space

That kind of multi-path reasoning didn't really improve COCONUT's accuracy over traditional chain-of-thought models on relatively straightforward tests of math reasoning (GSM8K) or general reasoning (ProntoQA). But the researchers found the model did comparatively well on a randomly generated set of ProntoQA-style queries involving complex and winding sets of logical conditions (e.g., "every apple is a fruit, every fruit is food, etc.").

For these tasks, standard chain-of-thought reasoning models would often get stuck down dead-end paths of inference or even hallucinate completely made-up rules when trying to resolve the logical chain. Previous research has also shown that the "verbalized" logical steps output by these chain-of-thought models "may actually utilize a different latent reasoning process" than the one being shared.

This new research joins a growing body of research seeking to understand and exploit the way large language models work at the level of their underlying neural networks. And while that kind of research hasn't led to a huge breakthrough just yet, the researchers conclude that models pre-trained with these kinds of "continuous thoughts" from the get-go could "enable models to generalize more effectively across a wider range of reasoning scenarios."[]

bnew · Dec 24, 2024

1/11
@Alibaba_Qwen

Happy holidays and we wish you enjoy this year. Before moving to 2025, Qwen has the last gift for you, which is QVQ!

This may be the first open-weight model for visual reasoning. It is called QVQ, where V stands for vision. It just reads an image and an instruction, starts thinking, reflects while it should, keeps reasoning, and finally it generates its prediction with confidence! However, it is still experimental and this preview version still suffers from a number of limitations (mentioned in our blog), which you should pay attention to while using the model. Feel free to refer to the following links for more information:

* Blog: QVQ: To See the World with Wisdom
* HF: QVQ - a Qwen Collection
* ModelScope: QVQ-72B-Preview
* Kaggle: QwenLM | QVQ-72B-Preview | Kaggle

It achieves impressive performance in benchmark evaluation, e.g., MMMU, MathVista, etc. But what is more interesting is that it is exciting to see the AI model behaves differently by thinking deeply and reasoning step by step instead of directly providing answers. Yet, it is still a model for preview. It is unstable, it might fall into repetition, it sometimes doesn't follow instruction, etc. We invite you to try the new interesting model and enjoy playing with it! Feel free to shoot us feedback!

2/11
@Alibaba_Qwen
QVQ achieves significant performance improvements in multiple benchmarks compared with Qwen2-VL-72B-Instruct.

3/11
@Alibaba_Qwen
An example for visual math problem solving.

https://video.twimg.com/ext_tw_video/1871601852678336512/pu/vid/avc1/1146x720/sE6q9cyD4Oyxrd8g.mp4

4/11
@Alibaba_Qwen
Demo is here!
QVQ 72B Preview - a Hugging Face Space by Qwen

5/11
@osanseviero
Links on @huggingface /@kaggle seem to be 404. Are they public?

6/11
@Alibaba_Qwen
Sorry... Santa's gift bag is too heavy, so the speed is a bit slow.

7/11
@Mansaleny
@Prince_Canuma

8/11
@0xzerebro
i have no idea what qwen is but i hope they have a good time with the holidays and stuff

9/11
@GozukaraFurkan
This looks like a great model for student education from the examples :smile:

10/11
@opneai
merry christmas

11/11
@Emily_Escapor
Congratulations

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@reach_vb
Qwen released QvQ 72B OpenAI o1 like reasoning model on Hugging Face with Vision capabilities - beating GPT4o, Claude Sonnet 3.5

2/11
@reach_vb
Check out the model weights here:

Qwen/QVQ-72B-Preview · Hugging Face

3/11
@reach_vb
Directly play with the model here:

QVQ 72B Preview - a Hugging Face Space by Qwen

4/11
@reach_vb
Soo soo looking forward to what comes next:

5/11
@reach_vb
It's Apache 2.0 licensed

[Quoted tweet]
OMFG! ITS APACHE 2.0 LICENSED!

6/11
@TheResearch_X
Is it possible to run this model on an M4 MAX, 128GB RAM?

7/11
@reach_vb
Pretty much yes! Just need to convert to llama.cpp format

8/11
@TheXeophon
We never have been so unbelievably back

9/11
@reach_vb
I'm sooo pumped for 2025!

10/11
@attributeshift
Qwen is doing god’s work this year

11/11
@ShivamKumar212
The best Christmas gift

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/5
@mervenoyann
QwQ can see

@Alibaba_Qwen released QvQ, a vision LM with reasoning

it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
in the next one

2/5
@mervenoyann
demo is here QVQ 72B Preview - a Hugging Face Space by Qwen
model is here Qwen/QVQ-72B-Preview · Hugging Face
read more QVQ: To See the World with Wisdom

3/5
@EthanSynthMind
QvQ's open weights are a game-changer. curious to see how it stacks up in real-world apps.

4/5
@Prince_Canuma
Coming to the closest Apple silicon Mac :smile:

5/5
@Yhprums_Law
i do wonder how qvq handles like super visually cluttered inputs or if it has a fallback when images are a mess? curious if it can detect when information is incomplete or contradictory...

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/7
@Prince_Canuma
QvQ-72B-Preview now on MLX

TLDR

SoTA open-source multimodal

Capable of step-by-step reasoning

Competitive MMMU score with o1, GPT-4o and Sonnet 3.5

Beats GPT-4o and Sonnet 3.5 on MathVista and MathVision

You can now run inference and finetune (QLora) locally on your Mac.

> pip install mlx-vlm

Model cards

[Quoted tweet]

Happy holidays and we wish you enjoy this year. Before moving to 2025, Qwen has the last gift for you, which is QVQ!

This may be the first open-weight model for visual reasoning. It is called QVQ, where V stands for vision. It just reads an image and an instruction, starts thinking, reflects while it should, keeps reasoning, and finally it generates its prediction with confidence! However, it is still experimental and this preview version still suffers from a number of limitations (mentioned in our blog), which you should pay attention to while using the model. Feel free to refer to the following links for more information:

* Blog: qwenlm.github.io/blog/qvq-72…
* HF: huggingface.co/collections/Q…
* ModelScope: modelscope.cn/models/Qwen/QV…
* Kaggle: kaggle.com/models/qwen-lm/qv…

It achieves impressive performance in benchmark evaluation, e.g., MMMU, MathVista, etc. But what is more interesting is that it is exciting to see the AI model behaves differently by thinking deeply and reasoning step by step instead of directly providing answers. Yet, it is still a model for preview. It is unstable, it might fall into repetition, it sometimes doesn't follow instruction, etc. We invite you to try the new interesting model and enjoy playing with it! Feel free to shoot us feedback!

https://video.twimg.com/amplify_video/1871687109121093632/vid/avc1/1344x720/sWC2tEBUbqi0mbuq.mp4

2/7
@Prince_Canuma
QVQ-72B-Preview - a mlx-community Collection

3/7
@mccatec
That’s fast, especially during Christmas season

4/7
@Prince_Canuma
That’s why I’m a King

Express delivery from North Pole

5/7
@ivanfioravanti
Well done!!!!

6/7
@Prince_Canuma
Thank you

7/7
@just_aristides
It's a Merry Christmas not just a happy holiday with what you guys shipped

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Dec 26, 2024

A popular technique to make AI more efficient has drawbacks | TechCrunch

One of the most widely used techniques to make AI models more efficient, quantization, has limits — and the industry could be fast approaching them. In

techcrunch.com

A popular technique to make AI more efficient has drawbacks

Kyle Wiggers

9:53 AM PST · December 23, 2024

One of the most widely used techniques to make AI models more efficient, quantization, has limits — and the industry could be fast approaching them.

In the context of AI, quantization refers to lowering the number of bits — the smallest units a computer can process — needed to represent information. Consider this analogy: When someone asks the time, you’d probably say “noon” — not “oh twelve hundred, one second, and four milliseconds.” That’s quantizing; both answers are correct, but one is slightly more precise. How much precision you actually need depends on the context.

AI models consist of several components that can be quantized — in particular parameters, the internal variables models use to make predictions or decisions. This is convenient, considering models perform millions of calculations when run. Quantized models with fewer bits representing their parameters are less demanding mathematically, and therefore computationally. (To be clear, this is a different process from “distilling,” which is a more involved and selective pruning of parameters.)

But quantization may have more trade-offs than previously assumed.

The ever-shrinking model

According to a study from researchers at Harvard, Stanford, MIT, Databricks, and Carnegie Mellon, quantized models perform worse if the original, unquantized version of the model was trained over a long period on lots of data. In other words, at a certain point, it may actually be better to just train a smaller model rather than cook down a big one.

That could spell bad news for AI companies training extremely large models (known to improve answer quality) and then quantizing them in an effort to make them less expensive to serve.

The effects are already manifesting. A few months ago, developers and academics reported that quantizing Meta’s Llama 3 model tended to be “more harmful” compared to other models, potentially due to the way it was trained.

“In my opinion, the number one cost for everyone in AI is and will continue to be inference, and our work shows one important way to reduce it will not work forever,” Tanishq Kumar, a Harvard mathematics student and the first author on the paper, told TechCrunch.

Contrary to popular belief, AI model inferencing — running a model, like when ChatGPT answers a question — is often more expensive in aggregate than model training. Consider, for example, that Google spent an estimated $191 million to train one of its flagship Gemini models — certainly a princely sum. But if the company were to use a model to generate just 50-word answers to half of all Google Search queries, it’d spend roughly $6 billion a year.

Major AI labs have embraced training models on massive datasets under the assumption that “scaling up” — increasing the amount of data and compute used in training — will lead to increasingly more capable AI.

For example, Meta trained Llama 3 on a set of 15 trillion tokens. (Tokens represent bits of raw data; 1 million tokens is equal to about 750,000 words.) The previous generation, Llama 2, was trained on “only” 2 trillion tokens. In early December, Meta released a new model, Llama 3.3 70B, which the company says “improves core performance at a significantly lower cost.”

Evidence suggests that scaling up eventually provides diminishing returns; Anthropic and Google reportedly recently trained enormous models that fell short of internal benchmark expectations. But there’s little sign that the industry is ready to meaningfully move away from these entrenched scaling approaches.

How precise, exactly?

So, if labs are reluctant to train models on smaller datasets, is there a way models could be made less susceptible to degradation? Possibly. Kumar says that he and co-authors found that training models in “low precision” can make them more robust. Bear with us for a moment as we dive in a bit.

“Precision” here refers to the number of digits a numerical data type can represent accurately. Data types are collections of data values, usually specified by a set of possible values and allowed operations; the data type FP8, for example, uses only 8 bits to represent a floating-point number.

Most models today are trained at 16-bit or “half precision” and “post-train quantized” to 8-bit precision. Certain model components (e.g., its parameters) are converted to a lower-precision format at the cost of some accuracy. Think of it like doing the math to a few decimal places but then rounding off to the nearest 10th, often giving you the best of both worlds.

Hardware vendors like Nvidia are pushing for lower precision for quantized model inference. The company’s new Blackwell chip supports 4-bit precision, specifically a data type called FP4; Nvidia has pitched this as a boon for memory- and power-constrained data centers.

But extremely low quantization precision might not be desirable. According to Kumar, unless the original model is incredibly large in terms of its parameter count, precisions lower than 7- or 8-bit may see a noticeable step down in quality.

If this all seems a little technical, don’t worry — it is. But the takeaway is simply that AI models are not fully understood, and known shortcuts that work in many kinds of computation don’t work here. You wouldn’t say “noon” if someone asked when they started a 100-meter dash, right? It’s not quite so obvious as that, of course, but the idea is the same:

“The key point of our work is that there are limitations you cannot naïvely get around,” Kumar concluded. “We hope our work adds nuance to the discussion that often seeks increasingly low precision defaults for training and inference.”

Kumar acknowledges that his and his colleagues’ study was at relatively small scale — they plan to test it with more models in the future. But he believes that at least one insight will hold: There’s no free lunch when it comes to reducing inference costs.

“Bit precision matters, and it’s not free,” he said. “You cannot reduce it forever without models suffering. Models have finite capacity, so rather than trying to fit a quadrillion tokens into a small model, in my opinion much more effort will be put into meticulous data curation and filtering, so that only the highest quality data is put into smaller models. I am optimistic that new architectures that deliberately aim to make low precision training stable will be important in the future.”

This story originally published November 17, 2024, and was updated on December 23 with new information.

Large Language Models News & Discussions

Veteran

Veteran

Veteran

Amazon’s Moonshot Plan to Rival Nvidia in AI Chips​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

The AI war between Google and OpenAI has never been more heated​

The Google AI rush​

Willison weighs in on the AI frenzy​

AI not slowing down, says Mollick​

Veteran

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model​

Veteran

New physics sim trains robots 430,000 times faster than reality​

Generating dynamic worlds​

Training tomorrow’s robots today (using Python)​

Veteran

A new, uncensored AI video model may spark a new AI hobbyist movement​

Putting HunyuanVideo to the test​

Weighing the results​

Veteran

Are LLMs capable of non-verbal reasoning?​

Wait, what space?​

Being more breadth-minded​

Veteran

Veteran

A popular technique to make AI more efficient has drawbacks​

The ever-shrinking model​

How precise, exactly?​

Amazon’s Moonshot Plan to Rival Nvidia in AI Chips

The AI war between Google and OpenAI has never been more heated

The Google AI rush

Willison weighs in on the AI frenzy

AI not slowing down, says Mollick

Not to be outdone by OpenAI, Google releases its own “reasoning” AI model

New physics sim trains robots 430,000 times faster than reality

Generating dynamic worlds

Training tomorrow’s robots today (using Python)

A new, uncensored AI video model may spark a new AI hobbyist movement

Putting HunyuanVideo to the test

Weighing the results

Are LLMs capable of non-verbal reasoning?

Wait, what space?

Being more breadth-minded

A popular technique to make AI more efficient has drawbacks

The ever-shrinking model

How precise, exactly?