bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia Launches Media2, the AI System Set to Improve Content Creation and Live Streaming​

Tristan Collins January 7, 2025

Nvidia Media2


Share:

During CES’s headline event on Monday night, Nvidia’s founder and CEO, Jensen Huang, introduced Media2, a groundbreaking AI system set to revolutionize content creation, streaming, and live media experiences.

Wearing a shiny black leather jacket appropriate for Las Vegas, Huang shared his excitement about Nvidia’s latest advancements in a variety of tech fields, from robotics to content development.

Huang, who launched Nvidia in 1993 and has seen it grow into a tech titan with a market cap of $3.5 trillion, detailed the company’s substantial influence on sectors like Hollywood’s visual effects and animation.

He also discussed the reliance on Nvidia technology by major tech companies including Google, Microsoft, Meta, Amazon, and Tesla.

According to Richard Kerris, Nvidia’s VP and GM of Media and Entertainment, the new Media2 system employs AI to enhance the way content adapts to individual preferences, making it smarter and more impactful.

Kerris, in a blog post during the keynote, emphasized how Media2 could transform viewer experiences by enabling features like voice commands during live broadcasts, which can provide summaries and other relevant information.

The presentation also highlighted Nvidia’s latest hardware innovations, including a new chip and a desktop computer designed to support the development of AI-powered robots and autonomous vehicles.

Nvidia’s new Cosmos platform will assist in creating physical AI systems by simulating real-world conditions, further pushing the boundaries of technology.

During his talk, Huang was optimistic about the future of autonomous vehicles, predicting it to be the first multi-trillion-dollar industry in robotics.

He mentioned partnerships with companies like Shutterstock, Getty Images, and Verizon, and how Comcast’s Sky is testing Nvidia’s new offerings to enhance customer interactivity and accessibility globally.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia launches Cosmos world foundation model platform to accelerate physical AI​


Dean Takahashi@deantak

January 6, 2025 7:43 PM

Cosmos World Foundation AI Model helps physical AI


Cosmos World Foundation AI Model helps physical AI

Image Credit: Nvidia


Nvidia has launched its Cosmos world foundation model platform to accelerate physical AI development.

In a keynote speech at CES 2025 by Nvidia CEO Jensen Huang, the company said the platform includes state-of-the-art generative world foundation models, advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as autonomous vehicles (AVs) and robots.

Physical AI models are costly to develop, and require vast amounts of real-world data and testing. Cosmos world foundation models, or WFMs, offer developers an easy way to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models. Developers can also build custom models by fine-tuning Cosmos WFMs.

Cosmos models will be available under an open model license to accelerate the work of the robotics and AV community. Developers can preview the first models on the Nvidia API catalog, or download the family of models and fine-tuning framework from the Nvidia NGCTM catalog or Hugging Face.

“It is trained on 20 million hours of video,” Huang said. “Nvidia Cosmos. It’s about teaching the AI to understand the physical world.”

cosmos-3.jpg
Cosmos generates synthetic data

Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi, and XPENG, along with ridesharing giant Uber are among the first to adopt Cosmos.

“The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not alldevelopers have the expertise and resources to train their own,” said Jensen Huang, founder and CEO of Nvidia, in a statement. “We created Cosmos to democratize physical AI and put general robotics in reach of every developer.”

nvidia-physical.jpg
Nvidia’s journey to CES 2025


Open world foundation models to accelerate the next wave of AI​


Nvidia Cosmos’ suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application.

Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data. The models are built for physically based interactions, object permanence, and high-quality generation of simulated industrial environments — like warehouses or factories — and of driving environments, including various road conditions.

In his opening keynote at CES, Huang showcased ways physical AI developers can use Cosmos models, including for:


  • Video search and understanding, enabling developers to easily find specific training scenarios, like snowy road conditions or warehouse congestion, from video data.
  • Controllable 3D-to-real synthetic data generation, using Cosmos models to generate photoreal videos from controlled 3D scenarios developed in the Nvidia Omniverse platform.
  • Physical AI model development and evaluation, whether building a custom model on the foundation models, improving the models using Cosmos for reinforcement learning or testing how they perform given a specific simulated scenario.
  • Foresight — the ability to predict the results of a physical AI model’s next potential actions — to help it select the best action to follow.
  • Multiverse simulation, using Cosmos and Omniverse to generate every possible future outcome an AI model could take to help it select the best and most accurate path.


Advanced world model development tools​


Cosmos-Blog-Image.jpg
Nvidia is marrying tech for AI in the physical world with digital twins.

Building physical AI models requires petabytes of video data and tens of thousands of compute hours to process, curate and label that data. To help save enormous costs in data curation, training and model customization, Cosmos features:

  • An Nvidia AI and CUDA-accelerated data processing pipeline, powered by Nvidia NeMo Curator, that enables developers to process, curate and label 20 million hours of videos in 14 days using the Nvidia Blackwell platform, instead of 3.4 years using a CPU-only pipeline.
  • Nvidia Cosmos Tokenizer, a state-of-the-art visual tokenizer for converting images and videos into tokens. It delivers eight times more total compression and 12 times faster processing than today’s leading tokenizers.
  • The Nvidia NeMo framework for highly efficient model training, customization and optimization.


World’s largest physical AI industries adopt cosmos​


Pioneers across the physical AI industry are already adopting Cosmos technologies.

1X, an AI and humanoid robot company, launched the 1X World Model Challenge dataset using Cosmos Tokenizer. XPENG will use Cosmos to accelerate the development of its humanoid robot. And Hillbot and SkildAI are using Cosmos to fast-track the development of their general-purpose robot.

“Data-scarcity and variability are key challenges to successful learning in robot environments,” said Pras Velagapudi, chief technology officer at Agility, in a statement. “Cosmos’ text-, image- and video-to-world capabilities allow us to generate and augment photorealistic scenarios in a variety of tasks that we can use to train models without needing as much expensive, real-world data capture.”

Transportation leaders are also using Cosmos to build physical AI for AVs.

Waabi, a company pioneering generative AI for the physical world, will use Cosmos for the search and curation of video data for AV software development and simulation.

Wayve, which is developing AI foundation models for autonomous driving, is evaluating Cosmos as a tool to search for edge and corner case driving scenarios used for safety and validation.

AV toolchain provider Foretellix will use Cosmos, alongside Nvidia Omniverse Sensor RTX APIs, to evaluate and generate high-fidelity testing scenarios and training data at scale.

Uber is partnering with Nvidia to accelerate autonomous mobility. Rich driving datasets from Uber, combined with the features of the Cosmos platform and Nvidia DGX Cloud, will help AV partners build stronger AI models even more efficiently.

“Generative AI will power the future of mobility, requiring both rich data and very powerful compute,” said Dara Khosrowshahi, CEO of Uber. “By working with Nvidia, we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry.”


Developing open, safe and responsible AI​


cosmos-2.jpg
Cosmos enables machines to understand the physical world.

Nvidia Cosmos was developed in line with Nvidia’s “trustworthy AI” principles, which prioritize privacy, safety, security, transparency and reducing unwanted bias.

Trustworthy AI is essential for fostering innovation within the developer community and maintaining user trust. Nvidia is committed to safe and trustworthy AI, in line with the White House’s voluntary AI commitments and other global AI safety initiatives.

The open Cosmos platform includes guardrails designed to mitigate harmful text and images, and features a tool to enhance text prompts for accuracy. Videos generated with Cosmos autoregressive and diffusion models on the Nvidia API catalog include invisible watermarks to identify AI-generated content, helping reduce the chances of misinformation and misattribution.

Nvidia encourages developers to adopt trustworthy AI practices and further enhance guardrail and watermarking solutions for their applications.


Availability​


cosmos.jpg
You can use Cosmos to train physical robots.

Cosmos WFMs are now available under Nvidia’s open model license on Hugging Face and the Nvidia NGC catalog. Cosmos models will soon be available as fully optimized Nvidia NIM microservices.

Developers can access Nvidia NeMo Curator for accelerated video processing and customize their own world models with Nvidia NeMo. Nvidia DGX Cloud offers a fast and easy way to deploy these models, with enterprise support available through the Nvidia AI Enterprise software platform.

Nvidia also announced new Nvidia Llama Nemotron large language models and Nvidia Cosmos Nemotron vision language models that developers can use for enterprise AI use cases in healthcare, financial services, manufacturing and more.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia’s AI agent play is here with new models, orchestration blueprints​


Emilia David@miyadavid

January 6, 2025 8:30 PM

ai-agent-orchestrator.png




The industry’s push into agentic AI continues, with Nvidia announcing several new services and models to facilitate the creation and deployment of AI agents.

Today, Nvidia launched Nemotron, a family of models based on Meta’s Llama and trained on the company’s techniques and datasets. The company also announced new AI orchestration blueprints to guide AI agents. These latest releases bring Nvidia, a company more known for the hardware that powers the generative AI revolution, to the forefront of agentic AI development.

Nemotron comes in three sizes: Nano, Super and Ultra. It also comes in two flavors: the Llama Nemotron for language tasks and the Cosmos Nemotron vision model for physical AI projects. The Llama Nemotron Nano has 4B parameters, the Super 49B parameters and the Ultra 253B parameters.

All three work best for agentic tasks including “instruction following, chat, function calling, coding and math,” according to the company.

Rev Lebaredian, VP of Omniverse and simulation technology at Nvidia, said in a briefing with reporters that the three sizes are optimized for different Nvidia computing resources. Nano is for cost-efficient low latency applications on PC and edge devices, Super is for high accuracy and throughput on a single GPU and Ultra is for highest accuracy at data center scale.

“AI agents are the digital workforce that will work for us and work with us, and so the Nemotron model family is for agentic AI,” said Lebaredian.

The Nemotron models are available as hosted APIs on Hugging Face and Nvidia’s website. Nvidia said enterprises can access the models through its AI Enterprise software platform.

Nvidia is no stranger to foundation models. Last year, it quietly released a version of Nemotron, Llama-3.1-Nemotron-70B-Instruct, that outperformed similar models from OpenAI and Anthropic. It also unveiled NVLM 1.0, a family of multimodal language models.


More support for agents​

AI agents became a big trend in 2024 as enterprises began exploring how to deploy agentic systems in their workflow. Many believe that momentum will continue this year.

Companies like Salesforce, ServiceNow, AWS and Microsoft have all called agents the next wave of gen AI in enterprises. AWS has added multi-agent orchestration to Bedrock, while Salesforce released its Agentforce 2.0, bringing more agents to its customers.

However, agentic workflows still need other infrastructure to work efficiently. One such infrastructure revolves around orchestration, or managing multiple agents crossing different systems.


Orchestration blueprints​


Nvidia has also entered the emerging field of AI orchestration with its blueprints that guide agents through specific tasks.

The company has partnered with several orchestration companies, including LangChain, LlamaIndex, CrewAI, Daily and Weights and Biases, to build blueprints on Nvidia AI Enterprise. Each orchestration framework has developed its own blueprint with Nvidia. For example, CrewAI created a blueprint for code documentation to ensure code repositories are easy to navigate. LangChain added Nvidia NIM microservices to its structured report generation blueprint to help agents return internet searches in different formats.

“Making multiple agents work together smoothly or orchestration is key to deploying agentic AI,” said Lebaredian. “These leading AI orchestration companies are integrating every Nvidia agentic building block, NIM, Nemo and Blueprints with their open-source agentic orchestration platforms.”

Nvidia’s new PDF-to-podcast blueprint aims to compete with Google’s NotebookLM by converting information from PDFs to audio. Another new blueprint will help build agents to search for and summarize videos.

Lebaredian said Blueprints aims to help developers quickly deploy AI agents. To that end, Nvidia unveiled Nvidia Launchables, a platform that lets developers test, prototype and run blueprints in one click.

Orchestration could be one of the bigger stories of 2025 as enterprises grapple with multi-agent production.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia unveils Isaac GR00T blueprint to accelerate humanoid robotics​


Dean Takahashi@deantak

January 6, 2025 8:08 PM



Nvidia GR00T makes it easier to design humanoid robots.


Nvidia Isaac GR00T makes it easier to design humanoid robots.

Image Credit: Nvidia


Nvidia has announced an Isaac GR00T blueprint to accelerate humanoid robotics development.

At Nvidia CEO Jensen Huang’s CES 2025 keynote, Huang said that Isaac GR00T workflows for synthetic data and Nvidia Cosmos world foundation models will supercharge development of general humanoid robots.


Robots on the march​


Over the next two decades, the market for humanoid robots is expected to reach $38 billion. To address this significant demand, particularly in the industrial and manufacturing sectors, Nvidia is releasing a collection of robot foundation models, data pipelines and simulation frameworks to accelerate next-generation humanoid robot development.

The Nvidia Isaac GR00T blueprint for synthetic motion generation helps developers generate exponentially large synthetic motion data to train their humanoids using imitation learning.

Imitation learning — a subset of robot learning — enables humanoids to acquire new skills by observing and mimicking expert human demonstrations. Collecting these extensive, high-quality datasets in the real world is tedious, time-consuming and often prohibitively expensive.

Implementing the Isaac GR00T blueprint for synthetic motion generation allows developers to easily generate exponentially large synthetic datasets from just a small number of human demonstrations.

Starting with the GR00T-Teleop workflow, users can tap into Apple Vision Pro to capture human actions in a digital twin. These human actions are mimicked by a robot in simulation and recorded for use as ground truth.

The GR00T-Mimic workflow then multiplies the captured human demonstration into a larger synthetic motion dataset. Finally, the GR00T-Gen workflow, built on the Nvidia Omniverse and Nvidia Cosmos platforms, exponentially expands this dataset through domain randomization and 3D upscaling.

The dataset can then be used as an input to the robot policy, which teaches robots how to move and interact with their environment effectively and safely in Nvidia Isaac Lab, an open-source and modular framework for robot learning.


World foundation models narrow the sim-to-real gap​


robots-2.jpg
Which one is not the robot?

Also at CES, Nvidia announced Cosmos, a platform featuring a family of open, pretrained world foundation models purpose-built for generating physics-aware videos and world states for physical AI development. It includes autoregressive and diffusion models in a variety of sizes and input-data formats. The models were trained on 18 quadrillion tokens, including 2 million hours of autonomous driving, robotics, drone footage and synthetic data.

In addition to helping generate large datasets, Cosmos can reduce the simulation-to-real gap by upscaling images from 3D to real. Combining Omniverse — a developer platform of application programming interfaces and microservices for building 3D applications and services — with Cosmos is critical, because it helps minimize potential hallucinations commonly associated with world models by providing crucial safeguards through its highly controllable, physically accurate simulations.


An expanding ecosystem​


robots.jpg
Nvidia GR00T generates synthetic data for robots.

Collectively, Nvidia Isaac GR00T, Omniverse and Cosmos are helping physical AI and humanoid innovation take a giant leap forward. Major robotics companies including Boston Dynamics and Figure have started adopting and demonstrating results with Isaac GR00T.

Humanoid software, hardware and robot manufacturers can apply for early access to Nvidia’s humanoid robot developer program.





1/11
@TheHumanoidHub
Jensen, at the CES 2025 stage with 14 humanoid robots standing in the background, announced NVIDIA Isaac GR00T Blueprint.

It's a simulation workflow for synthetic motion generation, enabling developers to create large datasets for training humanoids using imitation learning.



https://video.twimg.com/ext_tw_video/1876489889535180800/pu/vid/avc1/1280x720/Vjmd5uZd6tgoGvx_.mp4

2/11
@victor_explore
bet those humanoids still can't fold laundry though 🤖



3/11
@BriscoeCrainIV
It’s happening so fast! 🦾



4/11
@HunterReveur
Impressive. I’m now wondering if we’ll see a public release by summer’25.



5/11
@kianerfaan
iron man 2 vibes



GgqrGRuW8AAJau7.jpg


6/11
@Mnewbis
Straight up thought this was William shatner for a second 😂



7/11
@LawStud0842619
Why no @Tesla_Optimus ?



8/11
@EERandomness
Interesting that Optimus was not in the lineup?



9/11
@BLUECOW009
How do I get this developer kit?



10/11
@argoexp
ARBE anyone? Nvidia just invested in them yesterday



11/11
@__plotnikova
Really impressive how fast this is happening




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia using GenAI to integrate Omniverse virtual creations into physical AI apps​


Dean Takahashi@deantak

January 6, 2025 8:05 PM



Nvidia is helping companies design in Omniverse in digital form and take that into the physical AI world.


Nvidia is helping companies design in Omniverse in digital form and take that into the physical AI world.

Image Credit: Nvidia


Nvidia unveiled generative AI models and blueprints that expand Nvidia Omniverse integration further into physical AI applications such as robotics, autonomous vehicles and vision AI.

As part of the CES 2025 opening keynote by Nvidia CEO Jensen Huang, the company said global leaders in software development and professional services are using Omniverse to develop new products and services that will accelerate the next era of industrial AI.

Accenture, Altair, Ansys, Cadence, Foretellix, Microsoft and Neural Concept are among the first to integrate Omniverse into their next-generation software products and professional services. Siemens, a leader in industrial automation, announced today at the CES trade show the availability of Teamcenter Digital Reality Viewer — the first Siemens Xcelerator application powered by Nvidia Omniverse libraries.

“Physical AI will revolutionize the $50 trillion manufacturing and logistics industries. Everything that moves — from cars and trucks to factories and warehouses — will be robotic and embodied by AI,” said Huang, in a statement. “Nvidia’s Omniverse digital twin operating system and Cosmos physical AI serve as the foundational libraries for digitalizing the world’s physical industries.”


New models and frameworks accelerate world-building for physical AI​


Creating 3D worlds for physical AI simulation requires three steps: world-building, labeling
the world with physical attributes and making it photoreal.

Nvidia offers generative AI models that accelerate each step. The USD Code and USD Search Nvidia NIM microservices are now generally available. They let developers use text prompts to generate or search for OpenUSD assets. A new Nvidia Edify SimReady generative AI model unveiled today can automatically label existing 3D assets with attributes like physics or materials, enabling developers to process 1,000 3D objects in minutes instead of over 40 hours manually.

Nvidia Omniverse, paired with new Nvidia Cosmos world foundation models, creates a synthetic data multiplication engine that lets developers easily generate massive amounts of controllable, photoreal synthetic data. Developers can compose 3D scenarios in Omniverse and render images or videos as outputs. These can then be used with text prompts to condition Cosmos models to generate countless synthetic virtual environments for physical AI training.


Nvidia Omniverse blueprints speed up industrial, robotic workflows​


cosmos-4.jpg
Cosmos generates synthetic driving data.

During the CES keynote, Nvidia also announced four new blueprints that make it easier for developers to build universal scene description (OpenUSD)-based Omniverse digital twins for physical AI. The blueprints are:

  • Mega, powered by Omniverse Sensor RTX APIs, for developing and testing robot fleets at scale in an industrial factory or warehouse digital twin before deployment in real-world facilities

  • Autonomous Vehicle (AV) Simulation, also powered by Omniverse Sensor RTX APIs, that lets AV developers replay driving data, generate new ground-truth data and perform closed-loop testing to accelerate their development pipelines
  • Omniverse spatial streaming to Apple Vision Pro that helps developers create applications for immersive streaming of large-scale industrial digital twins to Apple Vision Pro
  • Real-time digital twins for computer aided engineering (CAE), a reference workflow built on Nvidia CUDA-X acceleration, physics AI and Omniverse libraries that enables real-time physics visualization

New, free “Learn OpenUSD” courses are also now available to help developers build OpenUSD-based worlds faster than ever.


Market leaders supercharge industrial AI using Nvidia Omniverse​


Global leaders in software development and professional services are using Omniverse to develop new products and services that are poised to accelerate the next era of industrial AI.

Building on its adoption of Omniverse libraries in its Reality Digital Twin data center digital twin platform, Cadence, a leader in electronic systems design, announced further integration of Omniverse into Allegro, its leading electronic computer-aided design application used by the world’s largest semiconductor companies.

Altair, a leader in computational intelligence, is adopting the Omniverse blueprint for real-time CAE digital twins for interactive computational fluid dynamics (CFD). Ansys is adopting Omniverse into Ansys Fluent, a leading CAE application. And Neural Concept is integrating Omniverse libraries into its next-generation software products, enabling real-time CFD and enhancing engineering workflows.

Accenture, a leading global professional services company, is using Mega to help German supply chain solutions leader Kion by building next-generation autonomous warehouses and robotic fleets for their network of global warehousing and distribution customers.

AV toolchain provider Foretellix, a leader in data-driven autonomy development, is using the AV simulation blueprint to enable full 3D sensor simulation for optimized AV testing and validation. Research organization MITRE is also deploying the blueprint, in collaboration with the University of Michigan’s Mcity testing facility, to create an industry-wide AV validation platform.

Katana Studio is using the Omniverse spatial streaming workflow to create custom car configurators for Nissan and Volkswagen, allowing them to design and review car models in an immersive experience while improving the customer decision-making process.

Innoactive, an XR streaming platform for enterprises, leveraged the workflow to add platform support for spatial streaming to Apple Vision Pro. The solution enables Volkswagen Group to conduct design and engineering project reviews at human-eye resolution. Innoactive also collaborated with Syntegon, a provider of processing and packaging technology solutions for pharmaceutical production, to enable Syntegon’s customers to walk through and review digital twins of custom installations before they are
built.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


1/11
@TheHumanoidHub
NVIDIA just introduced Cosmos, a platform for world foundation models designed for robotics.

⦿ It features advanced tokenizers, an AI-accelerated data pipeline, and integration with NVIDIA Omniverse. Humanoid makers 1X, Figure, and Agility are among the first to adopt Cosmos.

⦿ Cosmos generates synthetic, physics-based data, accelerating model training and customization.

⦿ It also features a CUDA-accelerated data processing pipeline that enables developers to process, curate, and label 20 million hours of videos in 14 days using the NVIDIA Blackwell platform.



https://video.twimg.com/ext_tw_video/1876485245312339968/pu/vid/avc1/1280x720/Res7shCLkFbL1Vsm.mp4

2/11
@TheHumanoidHub
Technical Blog:
NVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development



3/11
@Brenten55
This Blew Me Away !!! 😮

Entire video:
https://invidious.poast.org/live/k82RwXqZHY8?si=XBzhsKBcFFgyfCJK



4/11
@lwasinam
Speaking robot: Our new AI model translates vision and language into robotic actions



5/11
@steve_ike_
NVIDIA is cooking! Wonder if this helps some of the automakers who have fallen behind in self driving tech catch up?



6/11
@leo_grundstrom
Can this be used for video generation?



7/11
@robot_machines
This is the foreseeable future of robotics.



8/11
@AppyPieInc
Game-changer for robotics! Cosmos is setting a new standard for scalable training and deployment. Excited to see how pioneers like 1X and Agility leverage this!



9/11
@BriscoeCrainIV
I’m buying more NVIDIA in the morning! 😂



10/11
@lateboomer88
This sounds like the pre version of terminator Skynet.



11/11
@saumil_chandira
Enabling both offline and online agents in one keynote




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687


Nvidia’s Nemotron model families will advance AI agents​


Dean Takahashi@deantak

January 6, 2025 7:30 PM



Nvidia Nemotron Model Familes


Nvidia Nemotron Model Familes

Image Credit: Nvidia



As part of its bevy of AI announcements at CES 2025 today, Nvidia announced Nemotron model families to advance agentic AI.

Available as Nvidia NIM microservices, open Llama Nemotron large language models and Cosmos Nemotron vision language models can supercharge AI agents on any accelerated system.

Nvidia made the announcement as part of CEO Jensen Huang’s opening keynote today at CES 2025.


Agentic AI​


Artificial intelligence is entering a new era — the age of agentic AI — where teams of specialized agents can help people solve complex problems and automate repetitive tasks.

With custom AI agents, enterprises across industries can manufacture intelligence and achieve unprecedented productivity. These advanced AI agents require a system of multiple generative AI models optimized for agentic AI functions and capabilities. This complexity means that the need for powerful, efficient enterprise-grade models has never been greater.

“AI agents is the next robotic industry and likely to be a multibillion-dollar opportunity,” Huang said.

The Llama Nemotron family of open large language models (LLMs) is intended to provide a foundation for enterprise agentic AI. Built with Llama, the models can help developers create and deploy AI agents across a range of applications, including customer support, fraud detection, and product supply chain and inventory management optimization.

To be effective, many AI agents need both language skills and the ability to perceive the world and respond with the appropriate action.


Words and Visuals​


Nvidia Nemotron
Nvidia Nemotron

With the new Nvidia Cosmos Nemotron vision language models (VLMs) and Nvidia NIM microservices for video search and summarization, developers can build agents that analyze and respond to images and video from autonomous machines, hospitals, stores and warehouses, as well as sports events, movies and news. For developers seeking to generate physics-aware videos for robotics and autonomous vehicles, Nvidia today separately announced Nvidia Cosmos world foundation models.

The Nemotron models optimize compute efficiency and accuracy for AI agents built with Llama foundation models — one of the most popular commercially viable open-source model collections, downloaded over 650 million times — and provide optimized building blocks for AI agent development.

The models are pruned and trained with Nvidia’s latest techniques and high-quality datasets for enhanced agentic capabilities. They excel at instruction following, chat, function calling, coding and math, while being size-optimized to run on a broad range of Nvidia accelerated computing resources.

“Agentic AI is the next frontier of AI development, and delivering on this opportunity requires full-stack optimization across a system of LLMs to deliver efficient, accurate AI agents,” said Ahmad Al-Dahle, vice president and head of GenAI at Meta, in a statement. “Through our collaboration with Nvidia and our shared commitment to open models, the Nvidia Llama Nemotron family built on Llama can help enterprises quickly create their own custom AI agents.”


Early adopters​


Leading AI agent platform providers including SAP and ServiceNow are expected to be among the first to use the new Llama Nemotron models.

“AI agents that collaborate to solve complex tasks across multiple lines of the business will unlock a whole new level of enterprise productivity beyond today’s generative AI scenarios,” said Philipp Herzig, chief AI officer at SAP, in a statement. “Through SAP’s Joule, hundreds of millions enterprise users will interact with these agents to accomplish their goals faster than ever before. Nvidia’s new open Llama Nemotron model family will foster the development of multiple specialized AI agents to transform business processes.”

“AI agents make it possible for organizations to achieve more with less effort, setting new standards for business transformation,” said Jeremy Barnes, vice president of platform AI at ServiceNow, in a statement. “The improved performance and accuracy of Nvidia’s open Llama Nemotron models can help build advanced AI agent services that solve complex problems across functions, in any industry.”

The Nvidia Llama Nemotron models use Nvidia NeMo for distilling, pruning and alignment. Using these techniques, the models are small enough to run on a variety of computing platforms while providing high accuracy as well as increased model throughput.

The Nemotron models will be available as downloadable models and as Nvidia NIM microservices that can be easily deployed on clouds, data centers, PCs and workstations. They are intended to offer enterprises industry-leading performance with reliable, secure and seamless integration into their agentic AI application workflows.


Customize and connect to business knowledge with Nvidia NeMo​


The Llama Nemotron and Cosmos Nemotron model families are coming in Nano, Super and Ultra sizes to provide options for deploying AI agents at every scale.

● Nano: The most cost-effective model optimized for real-time applications with low latency, ideal for deployment on PCs and edge devices

● Super: A high-accuracy model offering exceptional throughput on a single GPU

● Ultra: The highest-accuracy model, designed for data-center-scale applications demanding the highest performance

Enterprises can also customize the models for their specific use cases and domains with Nvidia NeMo microservices to simplify data curation, accelerate model customization and evaluation, and apply guardrails to keep responses on track.

With Nvidia NeMo Retriever, developers can also integrate retrieval-augmented generation (RAG) capabilities to connect models to their enterprise data.

And using Nvidia Blueprints for agentic AI, enterprises can create their own applications using Nvidia’s advanced AI tools and end-to-end development expertise. In fact, Nvidia Cosmos Nemotron, Nvidia Llama Nemotron and NeMo Retriever supercharge the new Nvidia Blueprint for video search and summarization (announced separately today).

NeMo, NeMo Retriever and Nvidia Blueprints are all available with the Nvidia AI Enterprise software platform.


Availability​


Llama Nemotron and Cosmos Nemotron models will be available as hosted APIs and for download on build.nvidia.com and on Hugging Face. Access for development, testing and research is free for members of the Nvidia Developer Program.

Enterprises can run Llama Nemotron and Cosmos Nemotron NIM microservices in production with the Nvidia AI Enterprise software platform on accelerated data center and cloud infrastructure.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,057
Reputation
8,592
Daps
161,687









1/41
@heypearlai
Bye Bye ChatGPT

Nvidia just launched Chat with RTX and it's free for everyone.

Here are the details (+ how to download):



GgCvmScXsAA8AmO.png


2/41
@heypearlai
Discover Chat with RTX

NVIDIA's new software that allows you to host your own AI chatbot on your PC.

Experience privacy and speed with local, offline operation.

Some incredible features:



https://video.twimg.com/ext_tw_video/1873690649737752576/pu/vid/avc1/1280x720/BQigqqlkRAYpP6_6.mp4

3/41
@heypearlai
1. Operates locally on your PC

The RTX Chat runs entirely on your local PC.

You can choose a whole folder as the dataset and ask questions about it.



https://video.twimg.com/ext_tw_video/1873690761482379264/pu/vid/avc1/1280x720/3QNP_dtoBCozU2TJ.mp4

4/41
@heypearlai
2. YouTube Helper

This tool can read YouTube video transcripts and provide answers to your questions.

Just copy a YouTube video URL, paste it here, and ask anything about the video.



https://video.twimg.com/ext_tw_video/1873690901286920192/pu/vid/avc1/1280x720/l5YOP26Ut8JgvVur.mp4

5/41
@heypearlai
3. Select an AI Model

You have the option to select your AI model from Ilama or Mistral.

Mistral 7B is a popular choice.



https://video.twimg.com/ext_tw_video/1873690985563058176/pu/vid/avc1/1280x720/n8Cg0Rt5mK_gE5RL.mp4

6/41
@heypearlai
4. Choosing a Dataset

You have the option to select from 3 types of datasets:

• Folder Path (local folders)
• YouTube URL (YouTube videos)
• AI model default (default chatbot)



https://video.twimg.com/ext_tw_video/1873691073924476928/pu/vid/avc1/1280x720/LUjKHKGAuWoz3tvk.mp4

7/41
@heypearlai
5. Privacy

For those who prioritize privacy while using apps like ChatGPT, this solution is perfect.

Chat with RTX operates locally without needing an internet connection, ensuring the user's data remains private.



https://video.twimg.com/ext_tw_video/1873691183429414912/pu/vid/avc1/1280x720/sWZG8A1Gj3WWxr72.mp4

8/41
@heypearlai
How to download:

1. Go to: NVIDIA ChatRTX
2. Download and install on your Windows PC



GgCwQPvXIAAwkjU.png


9/41
@heypearlai
That's a wrap.

If you found this thread valuable, follow me @heypearlai



10/41
@Tommi_Lindfors
Interesting. I went to test it immediately, but my laptops GPU isn't sufficient enough. I'll set it up on a VM next week. Here are the specs, so others don't use time on the same thing:

System Requirements
- Platform Windows
- GPU NVIDIA GeForce™ RTX 30 or 40 Series GPU or NVIDIA RTX™ Ampere or Ada Generation GPU with at least 8GB of VRAM
- RAM16GB or greater
- OS Windows 11
- Driver 535.11 or later
- File Size 11 GB



11/41
@i8art_
@AndreSeverini



12/41
@0xTheWay
super cool



13/41
@MrlolDev
this has been out for a few months, i know cause I instaled and it sucks, ollama is better



14/41
@madcatattack1
i just use grok. it's pretty easy to set up and use



15/41
@ironbyte122
Too good to be true. Wow



16/41
@Acrominion
Nothing is free.



17/41
@richardpasqua
They simplified the whole local AI process. Love it, thanks!



18/41
@skilver5
Sounds great, have to test this on my repos and compare to ChatGPT 😇



19/41
@Adina_Coder
Excellent post! Packed with knowledge and practical advice.



20/41
@TheJustinify
Nice share. If they add agent capabilities that’d really put them ahead.



21/41
@RoxanneBT
I use grok



22/41
@in4hawaii
Mac?



23/41
@protean_onion
@nvidia MAKE THIS AVAILABLE FOR LINUX!



24/41
@PhDcornerHub
Until VPNs provide complete freedom and robust control over data privacy, they remain a limited solution for enterprises. While they may serve as a useful tool for individual consumers, enterprises require a more advanced and tailored approach to meet their complex privacy and security needs. A different, enterprise-grade solution is necessary to address these challenges effectively.”@opherbrayer @TelAvivUni



GgEPMrLXUAA8iNE.jpg

GgEPMrOW8AAX2Az.jpg

GgEPMrEWcAA4CDR.jpg


25/41
@BadFox4042
@TM72ALI



26/41
@nordin_eth
Big



27/41
@djwaters
Can @elonmusk make something better?



28/41
@richeddy
@readwiseio save thread



29/41
@trianglerosmi
No.



30/41
@Jaidchen
free for everyone¹

¹ who happens to already own a 4× RTX 3090 cluster



31/41
@CryptoShlong
Free is innovative these days.



32/41
@_dan_sweet
@UnrollHelper



33/41
@threadreaderapp
@_dan_sweet Salam, here is your unroll: Thread by @heypearlai on Thread Reader App Share this if you think it's interesting. 🤖



34/41
@06mohitify
@threadreaderapp unroll



35/41
@threadreaderapp
@06mohitify Hi! please find the unroll here: Thread by @heypearlai on Thread Reader App Have a good day. 🤖



36/41
@vasgeo310
@threadreaderapp unroll



37/41
@threadreaderapp
@vasgeo310 Salam, you can read it here: Thread by @heypearlai on Thread Reader App Have a good day. 🤖



38/41
@eastvillagetwt
@threadreaderapp unroll



39/41
@threadreaderapp
@eastvillagetwt Bonjour, the unroll you asked for: Thread by @heypearlai on Thread Reader App Have a good day. 🤖



40/41
@PeterS80148175
@threadreaderapp unroll



41/41
@threadreaderapp
@PeterS80148175 Hi, the unroll you asked for: Thread by @heypearlai on Thread Reader App Have a good day. 🤖




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top