NVIDIA has just announced its Cosmos 3 world model at the ongoing GTC Taipei, giving us a glimpse at what it calls the world’s first “fully open omnimodel” that is capable of vision-based reasoning, while supporting multimodal output in the form of text, image, video, and ambient sound. NVIDIA’s Cosmos 3 “pairs a reasoning transformer with an expert generation transformer,” allowing the model to grasp physical interactions before generating video and action content that leverages those interactions At its heart, the Cosmos 3 tackles the challenge of making robots, autonomous vehicles (AVs), and vision agents understand their surroundings in an […]
Read full article at https://wccftech.com/nvidia-calls-cosmos-3-the-worlds-first-fully-open-omnimodel-as-robots-and-autonomous-vehicles-get-a-powerful-brain-grounded-in-physics/
