Model Release

Nvidia Unveils Cosmos 3 World Model and Open Humanoid Robot at GTC Taipei

Nvidia announced Cosmos 3, a new world model, at GTC Taipei, offering three variants with different performance and inference capabilities.

Image: The Decoder

At GTC Taipei, Nvidia introduced a series of models aimed at advancing physical AI applications in robotics, autonomous vehicles, and video systems. The centerpiece of the event was the new world model, Cosmos 3, which is part of Nvidia's open 'omnimodel' series. Cosmos 3 processes text, images, video, ambient audio, and action data in a single system, allowing developers to generate synthetic training data, interpret scenes, and predict future world states without replicating real-world scenarios.

The model is designed for three main use cases: as a vision-language model to analyze video, as a world model to generate photorealistic video sequences of rare situations, and as a foundation for world-action models that produce numerical motion data for robots. The architecture combines a reasoning transformer with a generation transformer to analyze scenes and produce videos, descriptions, or motion trajectories. Training data included billions of examples across multiple modalities.

Nvidia released three variants of Cosmos 3: Cosmos 3 Super for the best quality, Nano for fast inference, and a forthcoming Edge model for real-time operation on embedded systems. The models are available under the OpenMDW-1.1 license on Hugging Face and GitHub.

Source: thedecoder

Key points

Nvidia announced Cosmos 3, a new world model, at GTC Taipei.
Cosmos 3 processes text, images, video, ambient audio, and action data in a single system.
The model is designed for three main use cases: as a vision-language model, a world model, and a foundation for world-action models.
Nvidia released three variants of Cosmos 3: Cosmos 3 Super, Nano, and a forthcoming Edge model.
The models are available under the OpenMDW-1.1 license on Hugging Face and GitHub.
Alpamayo 2 Super is meant to be a teacher model for robotaxis, with 32 billion parameters.
Nvidia is also releasing an open humanoid robot built on a Unitree chassis.

Source: The Decoder Read the original →

WRITTEN BY

Alex Lindgren

LLMs & Frontier Models

Alex covers the large language models and their impact on society.

Nvidia Unveils Cosmos 3 World Model and Open Humanoid Robot at GTC Taipei

Key points

Related articles

Google Renames NotebookLM as Gemini Notebook

Google Vids Update Lets Users Star in AI Videos

LightOn-rerank: 2B Model Reranks Text and Document Pages

Google Introduces Gemini Omni and Personal Avatars in Google Vids