At GTC Taipei, Nvidia introduced a series of models aimed at advancing physical AI applications in robotics, autonomous vehicles, and video systems. The centerpiece of the event was the new world model, Cosmos 3, which is part of Nvidia's open 'omnimodel' series. Cosmos 3 processes text, images, video, ambient audio, and action data in a single system, allowing developers to generate synthetic training data, interpret scenes, and predict future world states without replicating real-world scenarios. The model is designed for three main use cases: as a vision-language model to analyze video, as a world model to generate photorealistic video sequences of rare situations, and as a foundation for world-action models that produce numerical motion data for robots. The architecture combines a reasoning transformer with a generation transformer to analyze scenes and produce videos, descriptions, or motion trajectories. Training data included billions of examples across multiple modalities. Nvidia released three variants of Cosmos 3: Cosmos 3 Super for the best quality, Nano for fast inference, and a forthcoming Edge model for real-time operation on embedded systems. The models are available under the OpenMDW-1.1 license on Hugging Face and GitHub. *Source: [thedecoder](https://the-decoder.com/nvidia-bets-big-on-physical-ai-at-gtc-taipei-with-a-new-world-model-driving-brain-and-open-humanoid-robot/)*