NVIDIA Introduces Cosmos World Foundation Models for Physical AI Development
Image Source: Nvidia
NVIDIA has launched its NVIDIA Cosmos™ world foundation models (WFMs), a new set of tools announced on March 18, 2025, at the GTC event in San Jose, California, aimed at advancing physical AI development. The release includes an open, customizable reasoning model that gives developers detailed control over virtual world generation. Alongside this, NVIDIA unveiled two blueprints built on the NVIDIA Omniverse™ and Cosmos platforms, designed to generate large-scale synthetic data for training robots and autonomous vehicles. Early adopters include industry players like 1X, Agility Robotics, Figure AI, Foretellix, Skild AI, and Uber, indicating initial interest in the technology.
[Read More: Tesla Unveils Cybercab & Robovan: Musk's Bold Bet on Autonomous Robotaxis by 2026]
A Shift in Physical AI Tools
NVIDIA CEO Jensen Huang highlighted the models’ purpose, stating,
“Just as large language models revolutionized generative and agentic AI, Cosmos world foundation models are a breakthrough for physical AI”.
The Cosmos WFMs are intended to support robotics and physical industries by enabling developers to create tailored, photorealistic environments and datasets. This approach draws parallels to the role of large language models in digital AI, now applied to physical applications.
[Read More: CSIRO and Google Unite for AI-Driven Science]
Cosmos Transfer: Synthetic Data Capabilities
How It Works: Cosmos Transfer WFMs process structured video inputs—such as segmentation maps, depth maps, lidar scans, pose estimation maps, and trajectory maps—producing photorealistic video outputs. This functionality aids perception AI training by turning 3D simulations or ground truth data from Omniverse into synthetic videos, offering a controlled, scalable data generation method.
Industry Use: Agility Robotics is among the first to adopt Cosmos Transfer. Pras Velagapudi, the company’s chief technology officer, said, “Cosmos offers us an opportunity to scale our photorealistic training data beyond what we can feasibly collect in the real world”. He noted the potential to integrate existing physics-based simulation data with the platform. The NVIDIA Omniverse Blueprint for autonomous vehicle simulation uses Cosmos Transfer to vary sensor data, allowing Foretellix to adjust conditions like weather and lighting. Parallel Domain is also applying this blueprint to its sensor simulation work.
GR00T Blueprint for Robotics: The NVIDIA GR00T Blueprint, powered by Cosmos Transfer and Omniverse, focuses on synthetic manipulation motion generation. Using OpenUSD-powered simulations, it produces diverse datasets quickly, reducing data collection and augmentation times from days to hours, which could streamline robotics development.
[Read More: LucidSim: The AI-Powered Breakthrough Transforming Robot Training and Generalization]
Cosmos Predict: Generating Virtual Worlds
Model Features: First showcased at CES in January 2025, Cosmos Predict WFMs now support multi-frame generation, predicting intermediate actions or trajectories from start and end input images. Built for post-training, these models use multimodal inputs like text, images, and video, and can be customized with NVIDIA’s open physical AI dataset. They leverage the NVIDIA Grace Blackwell NVL72 systems for real-time world generation.
Practical Applications: 1X is using Cosmos Predict and Transfer to train its NEO Gamma humanoid robot, while Skild AI applies Cosmos Transfer to expand its synthetic datasets. In autonomous driving, Nexar and Oxa are employing Cosmos Predict to refine their systems, demonstrating the model’s range of uses.
[Read More: Defining AI: What Is Intelligence and Are Robots Truly Intelligent?]
Cosmos Reason: Reasoning in Physical Contexts
Model Capabilities: Cosmos Reason, an open and customizable WFM, features spatiotemporal awareness and uses chain-of-thought reasoning to analyze video data and predict interaction outcomes in natural language—such as a pedestrian crossing a street or a box falling. It’s designed for data annotation, curation, and building high-level planners for physical AI tasks.
Developer Options: Developers can use Cosmos Reason to refine existing WFMs or create new vision language action models. Available in early access, it reflects NVIDIA’s approach to gathering feedback for further refinement.
Supporting Data Curation and Training
Cosmos WFMs can be post-trained with PyTorch scripts or the NVIDIA NeMo™ framework on NVIDIA DGX™ Cloud, allowing customization for specific tasks. The NVIDIA NeMo Curator, also on DGX Cloud, speeds up data processing, aiding companies like Linker Vision and Milestone Systems in preparing video data for visual agents. Virtual Incision is exploring its use for surgical robotics, while Uber and Waabi apply it to autonomous vehicle projects.
Focus on Responsible AI
NVIDIA has integrated open guardrails into Cosmos WFMs to align with its trustworthy AI principles. In partnership with Google DeepMind, it’s adding SynthID watermarking to identify AI-generated outputs, aiming to enhance transparency in content creation.
[Read More: Biden Administration's AI Initiatives: A Comprehensive Overview]
Availability Details
Cosmos WFMs are available for preview in the NVIDIA API catalog and listed in Google Cloud’s Vertex AI Model Garden. Cosmos Predict and Transfer are accessible on Hugging Face and GitHub, while Cosmos Reason remains in early access, suggesting a gradual rollout to refine the tools.
[Read More: United Stand Against AI: Major Camera Brands Forge Alliance!]
Context and Outlook
NVIDIA’s Cosmos WFMs provide developers with new ways to create realistic, controllable training environments for physical AI, drawing interest from leading firms in robotics and autonomous systems. The emphasis on synthetic data could reduce reliance on real-world collection, though its effectiveness compared to actual data will likely be tested over time. NVIDIA’s focus on customization and transparency aligns with broader industry trends, but the technology’s influence on standards and practices will depend on its performance in real-world applications.
[Read More: The Looming Threat of 'Model Collapse': How Synthetic Data Challenges AI Progress]
Source: Nvidia