bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280
reuters.com

Exclusive: Meta begins testing its first in-house AI training chip​


Katie Paul

March 11, 20259:37 AM EDT



The logo of Meta Platforms' business group is seen in Brussels


Purchase Licensing Rights, opens new tab

NEW YORK, March 11 (Reuters) - Facebook owner Meta (META.O), opens new tab is testing its first in-house chip for training artificial intelligence systems, a key milestone as it moves to design more of its own custom silicon and reduce reliance on external suppliers like Nvidia (NVDA.O), opens new tab, two sources told Reuters.

The world's biggest social media company has begun a small deployment of the chip and plans to ramp up production for wide-scale use if the test goes well, the sources said.

The Reuters Daily Briefing newsletter provides all the news you need to start your day. Sign up here.

The push to develop in-house chips is part of a long-term plan at Meta to bring down its mammoth infrastructure costs as the company places expensive bets on AI tools to drive growth.

Meta, which also owns Instagram and WhatsApp, has forecast total 2025 expenses of $114 billion to $119 billion, including up to $65 billion in capital expenditure largely driven by spending on AI infrastructure.

One of the sources said Meta's new training chip is a dedicated accelerator, meaning it is designed to handle only AI-specific tasks. This can make it more power-efficient than the integrated graphics processing units (GPUs) generally used for AI workloads.

Meta is working with Taiwan-based chip manufacturer TSMC (2330.TW), opens new tab to produce the chip, this person said.

The test deployment began after Meta finished its first "tape-out" of the chip, a significant marker of success in silicon development work that involves sending an initial design through a chip factory, the other source said.

A typical tape-out costs tens of millions of dollars and takes roughly three to six months to complete, with no guarantee the test will succeed. A failure would require Meta to diagnose the problem and repeat the tape-out step.

Meta and TSMC declined to comment.

The chip is the latest in the company's Meta Training and Inference Accelerator (MTIA) series. The program has had a wobbly start for years and at one point scrapped a chip at a similar phase of development.

However, Meta last year started using an MTIA chip to perform inference, or the process involved in running an AI system as users interact with it, for the recommendation systems that determine which content shows up on Facebook and Instagram news feeds.

Meta executives have said they want to start using their own chips by 2026 for training, or the compute-intensive process of feeding the AI system reams of data to "teach" it how to perform.

As with the inference chip, the goal for the training chip is to start with recommendation systems and later use it for generative AI products like chatbot Meta AI, the executives said.

"We're working on how would we do training for recommender systems and then eventually how do we think about training and inference for gen AI," Meta's Chief Product Officer Chris Cox said at the Morgan Stanley technology, media and telecom conference last week.

Cox described Meta's chip development efforts as "kind of a walk, crawl, run situation" so far, but said executives considered the first-generation inference chip for recommendations to be a "big success."

Meta previously pulled the plug on an in-house custom inference chip after it flopped in a small-scale test deployment similar to the one it is doing now for the training chip, instead reversing course and placing orders for billions of dollars worth of Nvidia GPUs in 2022.

The social media company has remained one of Nvidia's biggest customers since then, amassing an arsenal of GPUs to train its models, including for recommendations and ads systems and its Llama foundation model series. The units also perform inference for the more than 3 billion people who use its apps each day.

The value of those GPUs has been thrown into question this year as AI researchers increasingly express doubts about how much more progress can be made by continuing to "scale up" large language models by adding ever more data and computing power.

Those doubts were reinforced with the late-January launch of new low-cost models from Chinese startup DeepSeek, which optimize computational efficiency by relying more heavily on inference than most incumbent models.

In a DeepSeek-induced global rout in AI stocks, Nvidia shares lost as much as a fifth of their value at one point. They subsequently regained most of that ground, with investors wagering the company's chips will remain the industry standard for training and inference, although they have dropped again on broader trade concerns.

Our Standards: The Thomson Reuters Trust Principles., opens new tab, opens new tab
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280
reuters.com

Spain to impose massive fines for not labelling AI-generated content​


Unknown Author

March 11, 20255:07 PM EDT



Illustration shows AI (Artificial Intelligence) letters and robot hand


Purchase Licensing Rights, opens new tab

MADRID, March 11 (Reuters) - Spain's government approved a bill on Tuesday imposing massive fines on companies that use content generated by artificial intelligence (AI) without properly labelling it as such, in a bid to curb the use of so-called "deepfakes".

The bill adopts guidelines from the European Union's landmark AI Act imposing strict transparency obligations on AI systems deemed to be high-risk, Digital Transformation Minister Oscar Lopez told reporters.

The Reuters Daily Briefing newsletter provides all the news you need to start your day. Sign up here.

"AI is a very powerful tool that can be used to improve our lives ... or to spread misinformation and attack democracy," he said.

Spain is among the first EU countries to implement the bloc's rules, considered more comprehensive than the United States' system that largely relies on voluntary compliance and a patchwork of state regulations.

Lopez added that everyone was susceptible to "deepfake" attacks - a term for videos, photographs or audios that have been edited or generated through AI algorithms but are presented as real.

The Spanish bill, which needs to be approved by the lower house, classifies non-compliance with proper labelling of AI-generated content as a "serious offence" that can lead to fines of up to 35 million euros ($38.2 million) or 7% of their global annual turnover.

Ensuring AI systems do not harm society has been a priority for regulators since OpenAI unveiled ChatGPT in late 2022, which wowed users by engaging them in human-like conversation and performing other tasks.

The bill also bans other practices, such as the use of subliminal techniques - sounds and images that are imperceptible - to manipulate vulnerable groups. Lopez cited chatbots inciting people with addictions to gamble or toys encouraging children to perform dangerous challenges as examples.

It would also prevent organisations from classifying people through their biometric data using AI, rating them based on their behaviour or personal traits to grant them access to benefits or assess their risk of committing a crime.

However, authorities would still be allowed to use real-time biometric surveillance in public spaces for national security reasons.

Enforcement of the new rules will be the remit of the newly-created AI supervisory agency AESIA, except in specific cases involving data privacy, crime, elections, credit ratings, insurance or capital market systems, which will be overseen by their corresponding watchdogs.

($1 = 0.9163 euros)

Our Standards: The Thomson Reuters Trust Principles., opens new tab, opens new tab
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280
reuters.com

French publishers and authors file lawsuit against Meta in AI case​


Unknown Author

March 12, 20257:30 AM EDT



llustration shows Meta logo


Purchase Licensing Rights, opens new tab

PARIS, March 12 (Reuters) - France's leading publishing and authors' associations have filed a lawsuit against U.S. tech giant Meta (META.O), opens new tab for allegedly using copyright-protected content on a massive scale without authorisation to train its artificial intelligence (AI) systems.

Representatives for Meta did not immediately respond to a request for comment.

The Reuters Daily Briefing newsletter provides all the news you need to start your day. Sign up here.

The National Publishing Union (SNE), the leading professional publishing association, the National Union of Authors and Composers (SNAC) and the Society of Men of Letters (SGDL), which defend the interests of authors, told a press conference on Wednesday they had filed a complaint against Meta earlier this week in a Paris court for alleged copyright infringement and economic "parasitism".

The three associations believe that Meta, which owns the Facebook, Instagram and WhatsApp social networks, was illegally using copyrighted content to train its AI models.

"We are witnessing monumental looting," said Maia Bensimon, general delegate of SNAC.

"It's a bit of a David versus Goliath battle," SNE Director General Renaud Lefebvre said. "It's a procedure that serves as an example," he added.

This is the first such action against an AI giant in France but there is a wave of lawsuits notably in the United States against Meta and other tech companies by authors, visual artists, music publishers and other copyright owners over the data used to train their generative AI systems.

In the United States, Meta is notably the target of a lawsuit filed in 2023 by American actress and author Sarah Silverman and other authors. The plaintiffs argue that Meta misused their books to train its large language model Llama.

American novelist Christopher Farnsworth filed a similar lawsuit against Meta in October 2024.

OpenAI, the company behind the AI tool ChatGPT, also faces a series of similar lawsuits in the United States, Canada, and India.

Our Standards: The Thomson Reuters Trust Principles., opens new tab, opens new tab
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280
reuters.com

If Europe builds the gigafactories, will an AI industry come?​


Toby Sterling

March 11, 202512:49 PM EDT



Illustration shows AI Artificial intelligence words, miniature of robot and EU flag


Purchase Licensing Rights, opens new tab

AMSTERDAM, March 11 (Reuters) - The European Commission is raising $20 billion to construct four "AI gigafactories" as part of Europe's strategy to catch up with the U.S. and China on artificial intelligence, but some industry experts question whether it makes sense to build them.

The plan for the large public access data centres, unveiled by European Commission President Ursula von der Leyen last month, will face challenges ranging from obtaining chips to finding suitable sites and electricity.

The Reuters Daily Briefing newsletter provides all the news you need to start your day. Sign up here.

"Even if we would build such a big computing factory in Europe, and even if we would train a model on that infrastructure, once it's ready, what do we do with it?," said Bertin Martens, of economic think tank Bruegel.

It's a chicken and egg problem. The hope is that new local firms such as France's Nvidia-backed Mistral start-up will grow and use them to create AI models that operate in line with EU AI safety and data protection rules, which are stricter than those in the U.S. or China.

But in the absence of large European cloud services businesses like Google and Amazon, or firms with millions of paying customers, like ChatGPT maker OpenAI, building hardware on this scale is a risky venture.

The gigafactory plan is part of Europe's response to the Draghi report on competitiveness, which advised bold investments and a more active industrial policy. Von der Leyen released details for the first time at the February 11 AI summit in Paris as part of InvestAI, Europe's 200 billion euro ($216.92 billion) answer to the $500 billion U.S. Stargate plan.

She described gigafactories as a "public-private partnership ... (that) will enable all our scientists and companies – not just the biggest - to develop the most advanced very large models needed to make Europe an AI continent."

They are to be financed via a new 20 billion-euro fund, with money being drawn from existing EU programmes, and from member states. The European Investment Bank will participate.

Von der Leyen said gigafactories will contain 100,000 "cutting-edge" chips each -- making them more than four times larger than the biggest supercomputer currently under construction in the EU, the Jupiter project in Germany. U.S. chipmaker Nvidia sells the cutting-edge GPU chips needed to train AI for around $40,000 each -- implying a price tag of several billion euros per gigafactory.

While that's big, it still trails projects announced by U.S. firms. Facebook owner Meta is spending $10 billion to build a 1.3 million GPU facility in Louisiana powered by 1.5 gigawatts of electricity.

Data centre expert Kevin Restivo of real estate consultancy CBRE said that gigafactories would face the same problems facing private projects in Europe: difficulty obtaining scarce Nvidia chips and a lack of electricity on the scale required.

The U.S. government, under former President Joe Biden, capped access to AI chips to prevent gigafactories from being built in many European countries, though it is not clear if the Trump administration will uphold that.

"There's nothing to say that the government can't get its hands on those chips or that ... great projects can't come from it, but it's unlikely to happen in the short term," Restivo said.

Martens of Bruegel said it does not make sense to spend public money entering an AI spending race. "The lifetime of such factories, before you have to write it off and buy new Nvidia chips, is about ... a year and a half," he said.

Meanwhile, the breakthrough of Chinese AI model Deepseek raised questions about whether AI models can be trained with less computing power, and whether spending should instead be focused on applications, which require different kinds of chips.

Europe's previous major support plan for technology infrastructure, the 2023 Chips Act, failed to meet goals of bringing cutting-edge chip manufacturing to Europe or reaching 20% of global production, though it did lead to investment in new factories needed to make automotive chips.

Alongside the gigafactory plan, the Commission is also upgrading 12 scientific supercomputer centres to turn them into AI factories.

Kimmo Koski, managing director of Finland's LUMI supercomputer, said it is not yet clear how AI gigafactories will differ other than in size.

"In my understanding, it relates to pushing industry use further," he said. That would be "an innovation in Europe, a very welcome event of course."

He said supercomputers are already used for machine learning projects, alongside scientific uses such as in climate modelling. He pointed to Silo AI, a Finnish firm that used LUMI to help develop large language AI models before being snapped up in July last year by U.S. chipmaker AMD for $665 million.

Potential beneficiaries of the supercomputing expansion include European chipmakers that make non-GPU chips, still useful in data centres, including Germany's Infineon and ST Microelectronics of France, as well as startups including France's SiPearl and AxeleraAI of the Netherlands.

($1 = 0.9220 euros)

(This story has been corrected to fix the description of CBRE to real estate consultancy, not data centre consultancy, in paragraph 11)

Our Standards: The Thomson Reuters Trust Principles., opens new tab, opens new tab
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280
reuters.com

Microsoft developing AI reasoning models to compete with OpenAI, The Information reports​


Unknown Author

March 7, 202511:51 AM EST



Illustration shows Microsoft logo


Purchase Licensing Rights, opens new tab

March 7 (Reuters) - Microsoft (MSFT.O), opens new tab is developing in-house artificial intelligence reasoning models to compete with OpenAI and may sell them to developers, The Information reported on Friday, citing a person involved in this initiative.

The Redmond, Washington-based company, a major backer of OpenAI, has begun testing out models from xAI, Meta (META.O), opens new tab and DeepSeek as potential OpenAI replacements in Copilot, according to the report.

The Reuters Daily Briefing newsletter provides all the news you need to start your day. Sign up here.

Microsoft has been looking to reduce its dependence on the ChatGPT maker, even as its early partnership with the startup put it in a leadership position among Big Tech peers in the lucrative AI race.

Reuters reported exclusively in December that the company has been working on adding internal and third-party AI models to power its flagship AI product Microsoft 365 Copilot to diversify from the current underlying technology from OpenAI and reduce costs.

When Microsoft announced 365 Copilot in 2023, a major selling point was that it used OpenAI's GPT-4 model.

According to The Information report, Microsoft's AI division, led by Mustafa Suleyman, has completed the training of a family of models, internally referred to as MAI, which perform nearly as well as the leading models from OpenAI and Anthropic on commonly accepted benchmarks.

The team is also training reasoning models, which use chain-of-thought techniques — a reasoning process that generates answers with intermediate reasoning abilities when solving complex problems — that could compete directly with OpenAI's, the report said.

Suleyman's team is already experimenting with swapping out the MAI models, far larger than an earlier family of Microsoft models called Phi, for OpenAI's models in Copilot, the report said.

The company is considering releasing the MAI models later this year as an application programming interface, which will allow outside developers to weave these models into their own apps, the report said.

Microsoft and OpenAI did not immediately respond to Reuters' requests for comment.

Our Standards: The Thomson Reuters Trust Principles., opens new tab, opens new tab
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Everything Nvidia announced at its annual developer conference GTC​


By Reuters

March 18, 2025 9:34 PM UTC · Updated ago

Nvidia GPU Technology Conference (GTC) at the SAP Center in San Jose


Nvidia CEO Jensen Huang interacts with a small robot on stage during the keynote for the Nvidia GPU Technology Conference (GTC) at the SAP Center in San Jose, California, U.S. March 18, 2025. REUTERS/Brittany Hosea-Small/File Photo Purchase Licensing Rights

, opens new tab

March 18 (Reuters) - Nvidia (NVDA.O)

, opens new tab CEO Jensen Huang, on Tuesday, unveiled the company's next-generation line of chips at its annual software developer conference, aiming to reassure investors of its dominance in the rapidly evolving artificial intelligence industry.

Here are all the products announced on stage:

Sign up here.

BLACKWELL ULTRA​


Nvidia said its next graphics processing unit (GPU), called Blackwell Ultra, will be available in the second half of this year. The new GPU features larger memory than its current generation, enabling it to support larger AI models.

VERA RUBIN​


The latest Rubin chips and servers are set to offer improved speeds compared to their predecessors, especially in data transfers between chips, a key factor for the efficient operation of expansive AI systems that require multiple chips.

Paired with Nvidia's custom-designed processor, Vera, the new Vera Rubin computing system is expected to outperform the company's Blackwell architecture.

Vera Rubin will be released in the second half of 2026, followed by the launch of Vera Rubin Ultra in 2027.

FEYNMAN​


The Vera Rubin system will be succeeded by the Feynman architecture, scheduled for release in 2028.

DGX PERSONAL AI COMPUTERS​


Nvidia announced its new DGX AI computers, powered by its Blackwell Ultra chips, designed to assist developers in inferencing large models on desktops. The computers will be made by companies including Dell, Lenovo and HP. The device, which follows a smaller desktop machine introduced earlier this year, is seen as a direct challenge to some of Apple's (AAPL.O)

, opens new tab top-end Macs.

SPECTRUM-X AND QUANTUM-X NETWORKING CHIPS​


Nvidia's new silicon photonics networking chips will enable AI factories to connect millions of GPUs across various sites, while drastically reducing energy consumption.

The Quantum-X Photonics chips are expected to be available later this year, followed by the launch of Spectrum-X chips in 2026.

DYNAMO SOFTWARE​


The software, which Nvidia released for free, is intended to speed up the process of reasoning, in which AI models think through an answer to a question in multiple steps, rather than giving a one-shot answer.

NVIDIA ISAAC GR00T N1​


GR00T N1 is a foundational model for humanoid robots, equipped with a dual system for both fast and slow thinking - much like reasoning models.

The framework for GR00T N1 includes Newton, an open-source physics engine developed along with Google DeepMind and Disney Research, and built for creating robots.

Reporting by Zaheer Kachwala in Bengaluru; Editing by Mohammed Safi Shamsi
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Anthropic just gave Claude a superpower: real-time web search. Here’s why it changes everything​


Michael Nuñez – March 20, 2025 9:50 AM

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Anthropic announced today that its AI assistant Claude can now search and process information from the internet in real-time, addressing one of users’ most requested features and closing a critical competitive gap with OpenAI’s ChatGPT .

The new web search capability, available immediately for paid Claude users in the United States, transforms the AI assistant from a tool limited by its training data cutoff to one that can access and synthesize the latest information across the web.

“With web search, Claude has access to the latest events and information, boosting its accuracy on tasks that benefit from the most recent data,” Anthropic said in its announcement. The company emphasized that Claude will provide direct citations to sources, allowing users to fact-check information— a direct response to growing concerns about AI hallucinations and misinformation.

AI arms race intensifies as Anthropic secures billions in funding​


This launch comes at an important moment in the rapidly evolving AI sector. Just three weeks ago, Anthropic secured $3.5 billion in Series E funding at a post-money valuation of $61.5 billion , underscoring the high stakes in the AI race. Major backers include Lightspeed Venture Partners , Google (which holds a 14% stake) and Amazon , which has integrated Claude into its Alexa+ service.

The web search rollout also follows Anthropic’s recent release of Claude 3.7 Sonnet , which the company claims has set “a new high-water mark in coding abilities.” This focus on programming proficiency appears strategic, especially in light of CEO Dario Amodei’s recent prediction at a Council on Foreign Relations event that “in three to six months, AI will be writing 90% of the code” that software developers currently produce.

The timing of this feature launch reveals Anthropic’s determination to challenge OpenAI’s dominance in the consumer AI assistant market. While Claude has gained popularity among technical users for its nuanced reasoning and longer context window, the lack of real-time information access has been a significant handicap in head-to-head comparisons with ChatGPT. This update effectively neutralizes that disadvantage.

How Claude’s web search transforms enterprise decision-making​


Unlike traditional search engines that return a list of links, Claude processes search results and delivers them in a conversational format. Users simply toggle on web search in their profile settings, and Claude will automatically search the internet when needed to inform its responses.

Anthropic highlighted several business use cases for the web-enabled Claude: sales teams analyzing industry trends, financial analysts assessing current market data, researchers building grant proposals and shoppers comparing products across multiple sources.

This feature fundamentally changes how enterprise users can interact with AI assistants. Previously, professionals needed to toggle between search engines and AI tools, manually feeding information from one to the other. Claude’s integrated approach streamlines this workflow dramatically, potentially saving hours of research time for knowledge workers.

For financial services firms in particular, the ability to combine historical training data with breaking news creates a powerful analysis tool that could provide genuine competitive advantages. Investment decisions often hinge on connecting disparate pieces of information quickly — exactly the kind of task this integration aims to solve.

Behind the scenes: The technical infrastructure powering Claude’s new capabilities​


Behind this seemingly straightforward feature lies considerable technical complexity. Anthropic has likely spent months fine-tuning Claude’s ability to search effectively, understand context and determine when web search would improve its responses.

The update integrates with other recent technical improvements to the Anthropic API , including cache-aware rate limits, simpler prompt caching, and token-efficient tool use. These enhancements, announced earlier this month, aim to help developers process more requests while reducing costs. For certain applications, these enhancements can reduce token usage by up to 90%.

Anthropic has also upgraded its developer console to enable collaboration among teams working on AI implementations. The revised console allows developers to share prompts, collaborate on refinements and control extended thinking budgets — features particularly valuable for enterprise customers integrating Claude into their workflows.

The investment in these backend capabilities suggests Anthropic is building for scale, anticipating rapid adoption as more companies integrate AI into their operations. By focusing on developer experience alongside user-facing features, Anthropic is creating an ecosystem rather than just a product — a strategy that has served companies like Microsoft well in enterprise markets.

Voice mode: Anthropic’s next frontier in natural AI interaction​


A web search may be just the beginning of Anthropic’s feature expansion. According to a recent report in the Financial Times, the company is developing voice capabilities for Claude, potentially transforming how users interact with the AI assistant.

Mike Krieger, Anthropic’s chief product officer, told the Financial Times that the company is working on experiences that would allow users to speak directly to Claude. “We are doing some work around how Claude for desktop evolves… if it is going to be operating your computer, a more natural user interface might be to [speak to it],” Krieger said.

The company has reportedly held discussions with Amazon and voice-focused AI startup ElevenLabs about potential partnerships, though no deals have been finalized.

Voice interaction would represent a significant leap forward in making AI assistants more accessible and intuitive. The current text-based interaction model creates friction that voice could eliminate, potentially expanding Claude’s appeal beyond tech-savvy early adopters to a much broader user base.

How Anthropic’s safety-first approach shapes regulatory conversations​


As Anthropic expands Claude’s capabilities, the company continues to emphasize its commitment to responsible AI development. In response to California Governor Gavin Newsom’s Working Group on AI Frontier Models draft report released earlier this week, Anthropic expressed support for “objective standards and evidence-based policy guidance,” particularly highlighting transparency as “a low-cost, high-impact means of growing the evidence base around a new technology.”

“Many of the report’s recommendations already reflect industry best practices which Anthropic adheres to,” the company stated, noting its Responsible Scaling Policy that outlines how it assesses models for misuse and autonomy risks.

This focus on responsible development represents a core differentiator in Anthropic’s brand positioning since its founding in 2021, when Amodei and six colleagues left OpenAI to create an AI company with greater emphasis on safety.

Anthropic’s approach to regulation appears more collaborative than defensive, positioning the company favorably with policymakers who are increasingly focused on AI oversight. By proactively addressing safety concerns and contributing constructively to regulatory frameworks, Anthropic may be able to shape rules in ways that align with its existing practices while potentially creating compliance hurdles for less cautious competitors.

The future of AI assistants: From chatbots to indispensable digital partners​


Adding web search to Claude represents more than just feature parity with competitors — it signals Anthropic’s ambition to create AI systems that can function as comprehensive digital assistants rather than specialized tools.

This development marks a significant evolution in AI assistants. First-generation large language models were essentially sophisticated autocomplete systems with impressive but limited capabilities. The integration of real-time information access, combined with Claude’s existing reasoning abilities, creates something qualitatively different: a system that can actively help solve complex problems using up-to-date information.

Claude’s new capabilities offer compelling advantages for businesses investing in AI integration. Cognition , the maker of the AI software developer assistant Devin , has already leveraged Anthropic’s prompt caching to provide more context about codebases while reducing costs and latency, according to the company’s leadership.

The real potential of these systems goes far beyond simple information retrieval. By combining current data with deep contextual understanding, AI assistants like Claude could transform knowledge work by handling substantial portions of research, analysis, and content creation — freeing humans to focus on judgment, creativity, and interpersonal aspects of their roles.

What web search means for Claude users today and tomorrow​


Web search is available now for all paid Claude users in the United States, with support for free users and international expansion planned “soon,” according to the announcement. Users can activate the feature through their profile settings.

As competition in the AI assistant space intensifies, Anthropic’s deliberate approach to expanding Claude’s capabilities while maintaining its focus on safety and transparency suggests a long-term strategy focused on building user trust alongside technical advancement.

The race between AI companies is increasingly about balancing capability with reliability and trust. Features like web search with source citations serve both goals simultaneously, providing users with more powerful tools while maintaining transparency about information sources.

With Claude now able to tap into the internet’s vast resources while maintaining its characteristic nuanced reasoning, Anthropic has eliminated a key competitive disadvantage. More importantly, the company has taken a significant step toward creating AI systems that don’t just respond to queries but actively help users navigate an increasingly complex information landscape.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Small models as paralegals: LexisNexis distills models to build AI assistant​


Emilia David – March 20, 2025 4:06 PM

March 20, 2025 4:06 PM

vb-daily-phone.png


Credit: VentureBeat, generated with MidJourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

When legal research company LexisNexis created its AI assistant Protégé, it wanted to figure out the best way to leverage its expertise without deploying a large model.

Protégé aims to help lawyers, associates and paralegals write and proof legal documents and ensure that anything they cite in complaints and briefs is accurate . However, LexisNexis didn’t want a general legal AI assistant; they wanted to build one that learns a firm’s workflow and is more customizable.

LexisNexis saw the opportunity to bring the power of large language models (LLMs) from Anthropic and Mistral and find the best models that answer user questions the best, Jeff Reihl, CTO of LexisNexis Legal and Professional, told VentureBeat.

“We use the best model for the specific use case as part of our multi-model approach. We use the model that provides the best result with the fastest response time,” Reihl said. “For some use cases, that will be a small language model like Mistral or we perform distillation to improve performance and reduce cost.”

While LLMs still provide value in building AI applications, some organizations turn to using small language models (SLMs) or distilling LLMs to become small versions of the same model.

Distillation, where an LLM “teaches” a smaller model, has become a popular method for many organizations.

Small models often work best for apps like chatbots or simple code completion, which is what LexisNexis wanted to use for Protégé.

This is not the first time LexisNexis built AI applications, even before launching its legal research hub LexisNexis + AI in July 2024.

“We have used a lot of AI in the past, which was more around natural language processing, some deep learning and machine learning,” Reihl said. “That really changed in November 2022 when ChatGPT was launched, because prior to that, a lot of the AI capabilities were kind of behind the scenes. But once ChatGPT came out, the generative capabilities, the conversational capabilities of it was very, very intriguing to us.”

Small, fine-tuned models and model routing​


Reihl said LexisNexis uses different models from most of the major model providers when building its AI platforms. LexisNexis + AI used Claude models from Anthropic, OpenAI’s GPT models and a model from Mistral.

This multimodal approach helped break down each task users wanted to perform on the platform. To do this, LexisNexis had to architect its platform to switch between models .

“We would break down whatever task was being performed into individual components, and then we would identify the best large language model to support that component. One example of that is we will use Mistral to assess the query that the user entered in,” Reihl said.

For Protégé, the company wanted faster response times and models more fine-tuned for legal use cases. So it turned to what Reihl calls “fine-tuned” versions of models, essentially smaller weight versions of LLMs or distilled models.

“You don’t need GPT-4o to do the assessment of a query, so we use it for more sophisticated work, and we switch models out,” he said.

When a user asks Protégé a question about a specific case, the first model it pings is a fine-tuned Mistral “for assessing the query, then determining what the purpose and intent of that query is” before switching to the model best suited to complete the task. Reihl said the next model could be an LLM that generates new queries for the search engine or another model that summarizes results.

Right now, LexisNexis mostly relies on a fine-tuned Mistral model though Reihl said it used a fine-tuned version of Claude “when it first came out; we are not using it in the product today but in other ways.” LexisNexis is also interested in using other OpenAI models especially since the company came out with new reinforcement fine-tuning capabilities last year. LexisNexis is in the process of evaluating OpenAI’s reasoning models including o3 for its platforms.

Reihl added that it may also look at using Gemini models from Google.

LexisNexis backs all of its AI platforms with its own knowledge graph to perform retrieval augmented generation (RAG) capabilities, especially as Protégé could help launch agentic processes later.

The AI legal suite​


Even before the advent of generative AI, LexisNexis tested the possibility of putting chatbots to work in the legal industry. In 2017, the company tested an AI assistant that would compete with IBM’s Watson-powered Ross and Protégé sits in the company’s LexisNexis + AI platform, which brings together the AI services of LexisNexis.

Protégé helps law firms with tasks that paralegals or associates tend to do. It helps write legal briefs and complaints that are grounded in firms’ documents and data, suggest legal workflow next steps, suggest new prompts to refine searches, draft questions for depositions and discovery, link quotes in filings for accuracy, generate timelines and, of course, summarize complex legal documents.

“We see Protégé as the initial step in personalization and agentic capabilities,” Reihl said. “Think about the different types of lawyers: M&A, litigators, real estate. It’s going to continue to get more and more personalized based on the specific task you do. Our vision is that every legal professional will have a personal assistant to help them do their job based on what they do, not what other lawyers do.”

Protégé now competes against other legal research and technology platforms. Thomson Reuters customized OpenAI’s o1-mini-model for its CoCounsel legal assistant . Harvey, which raised $300 million from investors including LexisNexis, also has a legal AI assistant.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here .

An error occured.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Less is more: UC Berkeley and Google unlock LLM potential through simple sampling​


Ben dikkson – March 21, 2025 3:39 PM

March 21, 2025 3:39 PM

Image credit: VentureBeat with Imagen 3

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

A new paper by researchers from Google Research and the University of California, Berkeley, demonstrates that a surprisingly simple test-time scaling approach can boost the reasoning abilities of large language models (LLMs). The key? Scaling up sampling-based search, a technique that relies on generating multiple responses and using the model itself to verify them.

The core finding is that even a minimalist implementation of sampling-based search, using random sampling and self-verification, can elevate the reasoning performance of models like Gemini 1.5 Pro beyond that of o1-Preview on popular benchmarks. The findings can have important implications for enterprise applications and challenge the assumption that highly specialized training or complex architectures are always necessary for achieving top-tier performance.

The limits of current test-time compute scaling​


The current popular method for test-time scaling in LLMs is to train the model through reinforcement learning to generate longer responses with chain-of-thought (CoT) traces. This approach is used in models such as OpenAI o1 and DeepSeek-R1 . While beneficial, these methods usually require substantial investment in the training phase.

Another test-time scaling method is “self-consistency,” where the model generates multiple responses to the query and chooses the answer that appears more often. Self-consistency reaches its limits when handling complex problems, as in these cases, the most repeated answer is not necessarily the correct one.

Sampling-based search offers a simpler and highly scalable alternative to test-time scaling: Let the model generate multiple responses and select the best one through a verification mechanism. Sampling-based search can complement other test-time compute scaling strategies and, as the researchers write in their paper, “it also has the unique advantage of being embarrassingly parallel and allowing for arbitrarily scaling: simply sample more responses.”

More importantly, sampling-based search can be applied to any LLM, including those that have not been explicitly trained for reasoning.

How sampling-based search works​


The researchers focus on a minimalist implementation of sampling-based search, using a language model to both generate candidate responses and verify them. This is a “self-verification” process, where the model assesses its own outputs without relying on external ground-truth answers or symbolic verification systems.

Search-based sampling Credit: VentureBeatThe algorithm works in a few simple steps:

1—The algorithm begins by generating a set of candidate solutions to the given problem using a language model. This is done by giving the model the same prompt multiple times and using a non-zero temperature setting to create a diverse set of responses.

2—Each candidate’s response undergoes a verification process in which the LLM is prompted multiple times to determine whether the response is correct. The verification outcomes are then averaged to create a final verification score for the response.

3— The algorithm selects the highest-scored response as the final answer. If multiple candidates are within close range of each other, the LLM is prompted to compare them pairwise and choose the best one. The response that wins the most pairwise comparisons is chosen as the final answer.

The researchers considered two key axes for test-time scaling:

Sampling: The number of responses the model generates for each input problem.

Verification: The number of verification scores computed for each generated solution

How sampling-based search compares to other techniques​


The study revealed that reasoning performance continues to improve with sampling-based search, even when test-time compute is scaled far beyond the point where self-consistency saturates.

At a sufficient scale, this minimalist implementation significantly boosts reasoning accuracy on reasoning benchmarks like AIME and MATH. For example, Gemini 1.5 Pro’s performance surpassed that of o1-Preview, which has explicitly been trained on reasoning problems, and Gemini 1.5 Flash surpassed Gemini 1.5 Pro.

“This not only highlights the importance of sampling-based search for scaling capability, but also suggests the utility of sampling-based search as a simple baseline on which to compare other test-time compute scaling strategies and measure genuine improvements in models’ search capabilities,” the researchers write.

It is worth noting that while the results of search-based sampling are impressive, the costs can also become prohibitive. For example, with 200 samples and 50 verification steps per sample, a query from AIME will generate around 130 million tokens, which costs $650 with Gemini 1.5 Pro. However, this is a very minimalistic approach to sampling-based search, and it is compatible with optimization techniques proposed in other studies. With smarter sampling and verification methods, the inference costs can be reduced considerably by using smaller models and generating fewer tokens . For example, by using Gemini 1.5 Flash to perform the verification, the costs drop to $12 per question.

Effective self-verification strategies​


There is an ongoing debate on whether LLMs can verify their own answers. The researchers identified two key strategies for improving self-verification using test-time compute:

Directly comparing response candidates:Disagreements between candidate solutions strongly indicate potential errors. By providing the verifier with multiple responses to compare, the model can better identify mistakes and hallucinations, addressing a core weakness of LLMs. The researchers describe this as an instance of “implicit scaling.”

Task-specific rewriting:The researchers propose that the optimal output style of an LLM depends on the task. Chain-of-thought is effective for solving reasoning tasks, but responses are easier to verify when written in a more formal, mathematically conventional style. Verifiers can rewrite candidate responses into a more structured format (e.g., theorem-lemma-proof) before evaluation.

“We anticipate model self-verification capabilities to rapidly improve in the short term, as models learn to leverage the principles of implicit scaling and output style suitability, and drive improved scaling rates for sampling-based search,” the researchers write.

Implications for real-world applications​


The study demonstrates that a relatively simple technique can achieve impressive results, potentially reducing the need for complex and costly model architectures or training regimes.

This is also a scalable technique, enabling enterprises to increase performance by allocating more compute resources to sampling and verification. It also enables developers to push frontier language models beyond their limitations on complex tasks.

“Given that it complements other test-time compute scaling strategies, is parallelizable and allows for arbitrarily scaling, and admits simple implementations that are demonstrably effective, we expect sampling-based search to play a crucial role as language models are tasked with solving increasingly complex problems with increasingly large compute budgets,” the researchers write.
 

ViShawn

Superstar
Supporter
Joined
Aug 26, 2015
Messages
15,223
Reputation
5,920
Daps
51,287
GTC was overwhelming but worth it seeing what a lot of these companies are doing with AI. Im going to get a NVIDIA certification soon
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Hugging Face submits open-source blueprint, challenging Big Tech in White House AI policy fight​


Michael Nuñez – March 19, 2025 3:48 PM

March 19, 2025 3:48 PM

vb-daily-phone.png


Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

In a Washington policy landscape increasingly dominated by calls for minimal AI regulation, Hugging Face is making a distinctly different case to the Trump administration: open-source and collaborative AI development may be America’s strongest competitive advantage.

The AI platform company, which hosts more than 1.5 million public models across diverse domains, has submitted its recommendations for the White House AI Action Plan , arguing that recent breakthroughs in open-source models demonstrate they can match or exceed the capabilities of closed commercial systems at a fraction of the cost.

In its official submission, Hugging Face highlights recent achievements like OlympicCoder , which outperforms Claude 3.7 on complex coding tasks using just 7 billion parameters, and AI2’s fully open OLMo 2 models that match OpenAI’s o1-mini performance levels.

The submission comes as part of a broader effort by the Trump administration to gather input for its upcoming AI Action Plan, mandated by Executive Order 14179 , officially titled “Removing Barriers to American Leadership in Artificial Intelligence,” which was issued in January. The Order, which replaced the Biden administration’s more regulation-focused approach, emphasizes U.S. competitiveness and reducing regulatory barriers to development.

Hugging Face’s submission stands in stark contrast to those from commercial AI leaders like OpenAI , which has lobbied heavily for light-touch regulation and “the freedom to innovate in the national interest,” while warning about China’s narrowing lead in AI capabilities. OpenAI’s proposal emphasizes a “voluntary partnership between the federal government and the private sector” rather than what it calls “overly burdensome state laws.”

How open source could power America’s AI advantage: Hugging Face’s triple-threat strategy​


Hugging Face’s recommendations center on three interconnected pillars that emphasize democratizing AI technology. The company argues that open approaches enhance rather than hinder America’s competitive position.

“The most advanced AI systems to date all stand on a strong foundation of open research and open source software — which shows the critical value of continued support for openness in sustaining further progress,” the company wrote in its submission .

Its first pillar calls for strengthening open and open-source AI ecosystems through investments in research infrastructure like the National AI Research Resource (NAIRR) and ensuring broad access to trusted datasets. This approach contrasts with OpenAI’s emphasis on copyright exemptions that would allow proprietary models to train on copyrighted material without explicit permission.

“Investment in systems that can freely be re-used and adapted has also been shown to have a strong economic impact multiplying effect, driving a significant percentage of countries’ GDP,” Hugging Face noted, arguing that open approaches boost rather than hinder economic growth.

Smaller, faster, better: Why efficient AI models could democratize the technology revolution​


The company’s second pillar focuses on addressing resource constraints faced by AI adopters, particularly smaller organizations that can’t afford the computational demands of large-scale models. By supporting more efficient, specialized models that can run on limited resources, Hugging Face argues the U.S. can enable broader participation in the AI ecosystem.

“Smaller models that may even be used on edge devices, techniques to reduce computational requirements at inference, and efforts to facilitate mid-scale training for organizations with modest to moderate computational resources all support the development of models that meet the specific needs of their use context,” the submission explains.

On security—a major focus of the administration’s policy discussions—Hugging Face makes the counterintuitive case that open and transparent AI systems may be more secure in critical applications. The company suggests that “fully transparent models providing access to their training data and procedures can support the most extensive safety certifications,” while “open-weight models that can be run in air-gapped environments can be a critical component in managing information risks.”

Big tech vs. little tech: The growing policy battle that could shape AI’s future​


Hugging Face’s approach highlights growing policy divisions in the AI industry. While companies like OpenAI and Google emphasize speeding up regulatory processes and reducing government oversight, venture capital firm Andreessen Horowitz (a16z) has advocated for a middle ground, arguing for federal leadership to prevent a patchwork of state regulations while focusing regulation on specific harms rather than model development itself.

“Little Tech has an important role to play in strengthening America’s ability to compete in AI in the future, just as it has been a driving force of American technological innovation historically,” a16z wrote in its submission , using language that aligns somewhat with Hugging Face’s democratization arguments.

Google’s submission , meanwhile, focused on infrastructure investments, particularly addressing “surging energy needs” for AI deployment—a practical concern shared across industry positions.

Between innovation and access: The race to influence America’s AI future​


As the administration weighs competing visions for American AI leadership , the fundamental tension between commercial advancement and democratic access remains unresolved. OpenAI’s vision of AI development prioritizes speed and competitive advantage through a centralized approach, while Hugging Face presents evidence that distributed, open development can deliver comparable results while spreading benefits more broadly.

The economic and security arguments will likely prove decisive. If administration officials accept Hugging Face’s assertion that “a robust AI strategy must leverage open and collaborative development to best drive performance, adoption, and security,” open-source could find a meaningful place in national strategy. But if concerns about China’s AI capabilities dominate, OpenAI’s calls for minimal oversight might prevail.

What’s clear is that the AI Action Plan will set the tone for years of American technological development. As Hugging Face’s submission concludes, both open and proprietary systems have complementary roles to play — suggesting that the wisest policy might be one that harnesses the unique strengths of each approach rather than choosing between them. The question isn’t whether America will lead in AI, but whether that leadership will bring prosperity to the few or innovation for the many.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Hugging Face submits open-source blueprint, challenging Big Tech in White House AI policy fight​


Michael Nuñez – March 19, 2025 3:48 PM

March 19, 2025 3:48 PM


In a Washington policy landscape increasingly dominated by calls for minimal AI regulation, Hugging Face is making a distinctly different case to the Trump administration: open-source and collaborative AI development may be America’s strongest competitive advantage.

The AI platform company, which hosts more than 1.5 million public models across diverse domains, has submitted its recommendations for the White House AI Action Plan , arguing that recent breakthroughs in open-source models demonstrate they can match or exceed the capabilities of closed commercial systems at a fraction of the cost.

In its official submission, Hugging Face highlights recent achievements like OlympicCoder , which outperforms Claude 3.7 on complex coding tasks using just 7 billion parameters, and AI2’s fully open OLMo 2 models that match OpenAI’s o1-mini performance levels.

The submission comes as part of a broader effort by the Trump administration to gather input for its upcoming AI Action Plan, mandated by Executive Order 14179 , officially titled “Removing Barriers to American Leadership in Artificial Intelligence,” which was issued in January. The Order, which replaced the Biden administration’s more regulation-focused approach, emphasizes U.S. competitiveness and reducing regulatory barriers to development.

Hugging Face’s submission stands in stark contrast to those from commercial AI leaders like OpenAI , which has lobbied heavily for light-touch regulation and “the freedom to innovate in the national interest,” while warning about China’s narrowing lead in AI capabilities. OpenAI’s proposal emphasizes a “voluntary partnership between the federal government and the private sector” rather than what it calls “overly burdensome state laws.”

How open source could power America’s AI advantage: Hugging Face’s triple-threat strategy​


Hugging Face’s recommendations center on three interconnected pillars that emphasize democratizing AI technology. The company argues that open approaches enhance rather than hinder America’s competitive position.

“The most advanced AI systems to date all stand on a strong foundation of open research and open source software — which shows the critical value of continued support for openness in sustaining further progress,” the company wrote in its submission .

Its first pillar calls for strengthening open and open-source AI ecosystems through investments in research infrastructure like the National AI Research Resource (NAIRR) and ensuring broad access to trusted datasets. This approach contrasts with OpenAI’s emphasis on copyright exemptions that would allow proprietary models to train on copyrighted material without explicit permission.

“Investment in systems that can freely be re-used and adapted has also been shown to have a strong economic impact multiplying effect, driving a significant percentage of countries’ GDP,” Hugging Face noted, arguing that open approaches boost rather than hinder economic growth.

Smaller, faster, better: Why efficient AI models could democratize the technology revolution​


The company’s second pillar focuses on addressing resource constraints faced by AI adopters, particularly smaller organizations that can’t afford the computational demands of large-scale models. By supporting more efficient, specialized models that can run on limited resources, Hugging Face argues the U.S. can enable broader participation in the AI ecosystem.

“Smaller models that may even be used on edge devices, techniques to reduce computational requirements at inference, and efforts to facilitate mid-scale training for organizations with modest to moderate computational resources all support the development of models that meet the specific needs of their use context,” the submission explains.

On security—a major focus of the administration’s policy discussions—Hugging Face makes the counterintuitive case that open and transparent AI systems may be more secure in critical applications. The company suggests that “fully transparent models providing access to their training data and procedures can support the most extensive safety certifications,” while “open-weight models that can be run in air-gapped environments can be a critical component in managing information risks.”

Big tech vs. little tech: The growing policy battle that could shape AI’s future​


Hugging Face’s approach highlights growing policy divisions in the AI industry. While companies like OpenAI and Google emphasize speeding up regulatory processes and reducing government oversight, venture capital firm Andreessen Horowitz (a16z) has advocated for a middle ground, arguing for federal leadership to prevent a patchwork of state regulations while focusing regulation on specific harms rather than model development itself.

“Little Tech has an important role to play in strengthening America’s ability to compete in AI in the future, just as it has been a driving force of American technological innovation historically,” a16z wrote in its submission , using language that aligns somewhat with Hugging Face’s democratization arguments.

Google’s submission , meanwhile, focused on infrastructure investments, particularly addressing “surging energy needs” for AI deployment—a practical concern shared across industry positions.

Between innovation and access: The race to influence America’s AI future​


As the administration weighs competing visions for American AI leadership , the fundamental tension between commercial advancement and democratic access remains unresolved. OpenAI’s vision of AI development prioritizes speed and competitive advantage through a centralized approach, while Hugging Face presents evidence that distributed, open development can deliver comparable results while spreading benefits more broadly.

The economic and security arguments will likely prove decisive. If administration officials accept Hugging Face’s assertion that “a robust AI strategy must leverage open and collaborative development to best drive performance, adoption, and security,” open-source could find a meaningful place in national strategy. But if concerns about China’s AI capabilities dominate, OpenAI’s calls for minimal oversight might prevail.

What’s clear is that the AI Action Plan will set the tone for years of American technological development. As Hugging Face’s submission concludes, both open and proprietary systems have complementary roles to play — suggesting that the wisest policy might be one that harnesses the unique strengths of each approach rather than choosing between them. The question isn’t whether America will lead in AI, but whether that leadership will bring prosperity to the few or innovation for the many.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds​


Carl Franzen – March 20, 2025 11:21 AM

March 20, 2025 11:21 AM

Credit: VentureBeat made with OpenAI ChatGPT



OpenAI ‘s voice AI models have gotten it into trouble before with actor Scarlett Johansson , but that isn’t stopping the company from continuing to advance its offerings in this category.

Today, the ChatGPT maker has unveiled three new proprietary voice models: gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. These models will initially be available through the ChatGPT maker’s application programming interface (API) for third-party software developers to build their own apps. They will also be available on a custom demo site, OpenAI.fm , that individual users can access for limited testing and fun.



Moreover, the gpt-4o-mini-tts model voices can be customized from several pre-sets via text prompt to change their accents, pitch, tone and other vocal qualities — including conveying whatever emotions the user asks them to, which should go a long way to addressing any concerns OpenAI is deliberately imitating any particular user’s voice ( the company previously denied that was the case with Johansson, but pulled down the ostensibly imitative voice option, anyway). Now, it’s up to the user to decide how they want their AI voice to sound when speaking back.

In a demo with VentureBeat delivered over a video call, OpenAI technical staff member Jeff Harris showed how, using text alone on the demo site, a user could get the same voice to sound like a cackling mad scientist or a zen, calm yoga teacher.

Discovering and refining new capabilities within GPT-4o base​


The models are variants of the existing GPT-4o model OpenAI launched back in May 2024 and which currently powers the ChatGPT text and voice experience for many users, but the company took that base model and post-trained it with additional data to make it excel at transcription and speech. The company didn’t specify when the models might come to ChatGPT.

“ChatGPT has slightly different requirements in terms of cost and performance trade-offs, so while I expect they will move to these models in time, for now, this launch is focused on API users,” Harris said.

It is meant to supersede OpenAI’s two-year-old Whisper open-source text-to-speech model, offering lower word error rates across industry benchmarks and improved performance in noisy environments, with diverse accents, and at varying speech speeds across 100+ languages.

The company posted a chart on its website showing just how much lower the gpt-4o-transcribe models’ error rates are at identifying words across 33 languages compared to Whisper — with an impressively low 2.46% in English.

“These models include noise cancellation and a semantic voice activity detector, which helps determine when a speaker has finished a thought, improving transcription accuracy,” said Harris.

Harris told VentureBeat that the new gpt-4o-transcribe model family is not designed to offer “diarization,” or the capability to label and differentiate between different speakers. Instead, it is designed primarily to receive one (or possibly multiple voices) as a single input channel and respond to all inputs with a single output voice in that interaction, however long it takes.

The company isalso hosting a competition for the general public to find the most creative examples of using its demo voice site OpenAI.fm and share them online by tagging the @openAI account on X . The winner will receive a custom Teenage Engineering radio with theOpenAI logo, which OpenAI Head of Product, Platform Olivier Godement said is one of only three in the world.

An audio applications gold mine​


The enhancements make them particularly well-suited for applications such as customer call centers, meeting note transcription, and AI-powered assistants.

Impressively, the company’s newly launched Agents SDK from last week also allows those developers who have already built apps atop its text-based large language models like the regular GPT-4o to add fluid voice interactions with only about “nine lines of code,” according to a presenter during an OpenAI YouTube livestream announcing the new models (embedded above).

For example, an e-commerce app built atop GPT-4o could now respond to turn-based user questions like “Tell me about my last orders” in speech with just seconds of tweaking the code by adding these new models.

“For the first time, we’re introducing streaming speech-to-text, allowing developers to continuously input audio and receive a real-time text stream, making conversations feel more natural,” Harris said.

Still, for those devs looking for low-latency, real-time AI voice experiences, OpenAI recommends using its speech-to-speech models in the Realtime API.

Pricing and availability​


The new models are available immediately via OpenAI’s API, with pricing as follows:

•gpt-4o-transcribe:$6.00 per 1M audio input tokens (~$0.006 per minute)

•gpt-4o-mini-transcribe:$3.00 per 1M audio input tokens (~$0.003 per minute)

•gpt-4o-mini-tts:$0.60 per 1M text input tokens, $12.00 per 1M audio output tokens (~$0.015 per minute)

However, they arriveat a time of fiercer-than-ever competition in the AI transcription and speech space, with dedicated speech AI firms such as ElevenLabs offering their new Scribe model, which supports diarization and boasts a similarly (but not as low) reduced error rate of 3.3% in English. It is priced at$0.40 per hour of input audio (or $0.006 per minute, roughly equivalent).

Another startup, Hume AI, offers a new model, Octave TTS, with sentence-level and even word-level customization of pronunciation and emotional inflection — based entirely on the user’s instructions, not any pre-set voices.The pricing of Octave TTS isn’t directly comparable, but there is a free tier offering 10 minutes of audio and costs increase from there between

Meanwhile, more advanced audio and speech models are also coming to the open source community, including one called Orpheus 3B which is available with a permissive Apache 2.0 license , meaning developers don’t have to pay any costs to run it — provided they have the right hardware or cloud servers.

Industry adoption and early results​


According to testimonials shared by OpenAI with VentureBeat, several companies have already integrated OpenAI’s new audio models into their platforms, reporting significant improvements in voice AI performance.

EliseAI, a company focused on property management automation, found that OpenAI’s text-to-speech model enabled more natural and emotionally rich interactions with tenants.

The enhanced voices made AI-powered leasing, maintenance, and tour scheduling more engaging, leading to higher tenant satisfaction and improved call resolution rates.

Decagon, which builds AI-powered voice experiences, saw a 30% improvement in transcription accuracy using OpenAI’s speech recognition model.

This increase in accuracy has allowed Decagon’s AI agents to perform more reliably in real-world scenarios, even in noisy environments. The integration process was quick, with Decagon incorporating the new model into its system within a day.

Not all reactions to OpenAI’s latest release have been warm. Dawn AI app analytics software co-founder Ben Hylak (@benhylak) , a former Apple human interfaces designer, posted on X that while the models seem promising, the announcement “feels like a retreat from real-time voice,” suggesting a shift away from OpenAI’s previous focus on low-latency conversational AI via ChatGPT.

Additionally, the launch was preceded by an early leak on X (formerly Twitter). TestingCatalog News (@testingcatalog) posted details on the new models several minutes before the official announcement, listing the names of gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe. The leak was credited to @StivenTheDev, and the post quickly gained traction.

However, looking ahead, OpenAI plans to continue refining its audio models and exploring custom voice capabilities while ensuring safety and responsible AI use. Beyond audio, OpenAI is also investing in multimodal AI, including video, to enable more dynamic and interactive agent-based experiences.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Nvidia debuts Llama Nemotron open reasoning models in a bid to advance agentic AI​


Sean Michael Kerner – March 18, 2025 12:42 PM

March 18, 2025 12:42 PM



Credit: Image generated by VentureBeat with Stable Diffusion 3.5 Large



Nvidia is getting into the open source reasoning model market.

At the Nvidia GTC event today, the AI giant made a series of hardware and software announcements. Buried amidst the big silicon announcements, the company announced a new set of open source Llama Nemotron reasoning models to help accelerate agentic AI workloads. The new models are an extension of the Nvidia Nemotron models that were first announced in January at the Consumer Electronics Show (CES).

The new Llama Nemotron reasoning models are in part a response to the dramatic rise of reasoning models in 2025. Nvidia (and its stock price) were rocked to the core earlier this year when DeepSeek R1 came out , offering the promise of an open source reasoning model and superior performance.

The Llama Nemotron family models are competitive with DeepSeek offering business-ready AI reasoning models for advanced agents.

“Agents are autonomous software systems designed to reason, plan, act and critique their work,” Kari Briski, vice president of Generative AI Software Product Managements at Nvidia said during a GTC pre-briefing with press. “Just like humans, agents need to understand context to breakdown complex requests, understand the user’s intent, and adapt in real time.”

What’s inside Llama Nemotron for agentic AI​


As the name implies Llama Nemotron is based on Meta’s open source Llama models .

With Llama as the foundation, Briski said that Nvidia algorithmically pruned the model to optimize compute requirements while maintaining accuracy.

Nvidia also applied sophisticated post-training techniques using synthetic data. The training involved 360,000 H100 inference hours and 45,000 human annotation hours to enhance reasoning capabilities. All that training results in models that have exceptional reasoning capabilities across key benchmarks for math, tool calling, instruction following and conversational tasks, according to Nvidia.

The Llama Nemotron family has three different models​


The family includes three models targeting different deployment scenarios:

  • Nemotron Nano: Optimized for edge and smaller deployments while maintaining high reasoning accuracy.
  • Nemotron Super: Balanced for optimal throughput and accuracy on single data center GPUs.
  • Nemotron Ultra: Designed for maximum “agentic accuracy” in multi-GPU data center environments.

For availability, Nano and Super are now available at NIM micro services and can be downloaded at AI.NVIDIA.com. Ultra is coming soon.

Hybrid reasoning helps to advance agentic AI workloads​


One of the key features in Nvidia Llama Nemotron is the ability to toggle reasoning on or off.

The ability to toggle reasoning is an emerging capability in the AI market. Anthropic Claude 3.7 has a somewhat similar functionality, though that model is a closed proprietary model. In the open source space IBM Granite 3.2 also has a reasoning toggle that IBM refers to as – conditional reasoning.

The promise of hybrid or conditional reasoning is that it allows systems to bypass computationally expensive reasoning steps for simple queries. In a demonstration, Nvidia showed how the model could engage complex reasoning when solving a combinatorial problem but switch to direct response mode for simple factual queries.

Nvidia Agent AI-Q blueprint provides an enterprise integration layer​


Recognizing that models alone aren’t sufficient for enterprise deployment, Nvidia also announced the Agent AI-Q blueprint, an open-source framework for connecting AI agents to enterprise systems and data sources.

“AI-Q is a new blueprint that enables agents to query multiple data types—text, images, video—and leverage external tools like web search and other agents,” Briski said. “For teams of connected agents, the blueprint provides observability and transparency into agent activity, allowing developers to improve the system over time.”

The AI-Q blueprint is set to become available in April

Why this matters for enterprise AI adoption​


For enterprises considering advanced AI agent deployments, Nvidia’s announcements address several key challenges.

The open nature of Llama Nemotron models allows businesses to deploy reasoning-capable AI within their own infrastructure. That’s important as it can address data sovereignty and privacy concerns that can have limited adoption of cloud-only solutions. By building the new models as NIMs, Nvidia is also making it easier for organizations to deploy and manage deployments, whether on-premises or in the cloud.

The hybrid, conditional reasoning approach is also important to note as it provides organizations with another option to choose from for this type of emerging capability. Hybrid reasoning allows enterprises to optimize for either thoroughness or speed, saving on latency and compute for simpler tasks while still enabling complex reasoning when needed.

As enterprise AI moves beyond simple applications to more complex reasoning tasks, Nvidia’s combined offering of efficient reasoning models and integration frameworks positions companies to deploy more sophisticated AI agents that can handle multi-step logical problems while maintaining deployment flexibility and cost efficiency
.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,535
Reputation
9,263
Daps
169,280

Adobe previews AI generated PowerPoints from raw customer data with ‘Project Slide Wow’​


Carl Franzen – March 19, 2025 9:45 AM

March 19, 2025 9:45 AM


Credit: VentureBeat made with ChatGPT

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Today at Adobe ‘s annual digital innovation conference Summit 2024 in Las Vegas, the company is unveiling Project Slide Wow, a generative AI-driven tool designed to streamline the creation of PowerPoint presentations directly from raw customer data.

Presented as part of the “Adobe Sneaks” program, the innovation aims to solve a common challenge for marketers and analysts—transforming complex data into compelling, easy-to-digest presentations.

From data to presentation—automatically​


Project Slide Wow integrates with Adobe Customer Journey Analytics (CJA) to automatically generate PowerPoint slides populated with relevant data visualizations and speaker notes. This allows marketers and business analysts to quickly build data-backed presentations without manually structuring content or designing slides.

“It’s analyzing all the charts in this project, generating captions for them, organizing them into a narrative, and creating the presentation slides,” said Jane Hoffswell, a research scientist at Adobe and the creator of Project Slide Wow, in a video call interview with VentureBeat a few days ago. “It figures out the best way to focus on the most important pieces of data.”

A standout feature of the tool is its interactive AI agent within PowerPoint. Users can ask follow-up questions, request additional visualizations, or dynamically generate new slides on the fly, making it an adaptable solution for data-driven storytelling.

One of the biggest advantages of Project Slide Wow is its ability to handle live data updates.

Instead of static presentations that quickly become outdated, users can refresh their slides to reflect the latest analytics, which is particularly valuable for businesses that rely on real-time data insights.

“We wanted this technology to be able to keep your data fresh and alive. If you give a presentation six months later, people will want to know how things have changed,” Hoffswell told VentureBeat.

The tech under the hood​


Unlike many recent AI-powered tools, Project Slide Wow does not rely on large language models (LLMs) like OpenAI’s GPT or Adobe’s Firefly.

Instead, Adobe’s research team developed a proprietary algorithmic ranking and scoring system to determine which insights are most important for a given dataset.

The system prioritizes information based on:

  • Data Structure in Adobe Customer Journey Analytics (CJA) – Insights that appear higher in the CJA workflow receive more emphasis.
  • Relevance & Frequency—In slide generation, data points that appear multiple times across different analyses are given greater weight.
  • Narrative Organization – The tool algorithmically arranges insights into a logical storytelling structure to ensure a smooth presentation flow.“Our ranking algorithm looks at the layout of the original Customer Journey Analytics project—content higher up is likely more important,” said Hoffswell. “We also prioritize values that frequently appear in the data.”

Because the system is more rules-based and deterministic rather than relying on probabilistic AI models, it avoids common LLM issues such as hallucinated data or unpredictable outputs. It also allows enterprises to have greater transparency and control over how presentations are structured.

What it means for the enterprise and decision-makers​


For CTOs, CIOs, team leaders and developer managers, Project Slide Wow represents a potential shift in how teams work with data visualization and presentations. Here’s what it means for enterprise-level decision-making:

  • Greater Efficiency for Data Teams – Analysts and marketers can rapidly generate insights in presentation-ready formats, reducing the manual labor involved in building slides from scratch.
  • Scalability for Large Organizations—By integrating directly into existing workflows, large enterprises can standardize the way customer insights are presented across departments.
  • Data Integrity & Control – Unlike AI tools that create content based on unpredictable generative models, Project Slide Wow works within the enterprise’s existing datasets in CJA. This ensures data accuracy and minimizes compliance risks.
  • Enhanced Collaboration Between Teams—By allowing presentations to be updated dynamically, multiple teams—such as marketing, analytics, and product development—can work with the latest insights in real time without duplicating efforts.
  • Future Integration Potential – If Project Slide Wow becomes a full-fledged Adobe product, enterprise IT leaders may consider integrating it into their existing Microsoft 365 environments through the planned PowerPoint add-on.

Will it become a full product?​


Adobe Sneaks is an annual showcase of experimental innovations. Around 40% of featured projects eventually become Adobe products.

The fate of Project Slide Wow depends on user interest and engagement, as Adobe monitors social media conversations, customer feedback and direct inquiries to gauge demand.

Eric Matisoff, Adobe’s Digital Experience Evangelist and host of Adobe Sneaks, highlighted that the program serves as a testing ground for cutting-edge ideas.

“We start by scouring the company for hundreds of technologies and ideas…and whittle them down to the seven most exciting, entertaining, and forward-looking innovations,” Matisoff said.

Looking ahead​


For businesses that rely on data-driven decision-making, Project Slide Wow could be a major step forward in simplifying the process of building presentations. If the tool gains traction, it may soon be available as an official Adobe product—potentially transforming how companies use customer data to inform strategy.

Until then, CTOs, CIOs, team leads, and analysts should stay tuned for Adobe’s Sneaks announcements to see whether Project Slide Wow makes the leap from an experimental demo to a real-world enterprise solution.
 
Top