While the conversations about Agent AI and Genai technology are pervasive, Nvidia and other IT vendors are also implementing technologies that support the physical AI ecosystem.
On Monday, Nvidia introduced a new Omniverse SDK for building and deploying industrial AI and robotics applications, as well as the New World Foundation model.
Omniverse and World Models
The new Omniverse SDKS allows robot learning developers to simulate robots across platforms such as Universal Scene descriptions and Mujoco, a physics engine used in robotics, biomechanics and machine learning.
The Omniverse Nurec library and AI models introduce new rendering techniques that allow developers to capture, reconstruct and simulate the real world with sensor data.
Nvidia Isaac Sim 5.0 and Nvidia Isaac Lab 2.2 are source robot simulations now available on GitHub. Isaac Sim includes a sensor schema that robot developers can use to close the gap between simulation and reality.
Nvidia also revealed that the World Foundation model, Cosmos Transfer-2, will simplify prompt promotion and axela acceleration, coming soon.
The AI vendor has introduced a distilled version of COSMOS transfer. This requires only one step of distillation rather than 70, so developers can run the model on an NVIDIA RTX Pro server.
Contributing to the open market, the company has introduced Nvidia Cosmos Reason, a new open, customizable 7 billion parameter inference vision language model for physical AI and robotics. The model allows robots and vision AI agents to have human-like reasons, Nvidia said.
Open models are for applications such as data curation and annotation, robot planning and inference, and video analytics AI agents.
Physical AI and robots
The release of these new models of physical AI shows a growing market interest as Genai and agent AI technology continues to mature.
The ideas around world models and all associated technologies are the big next steps for AI.
Tuong Huy NguyenAnalyst Gartner
“The world model and all the technology ideas associated with them are the next big step in AI,” said Tuong Huy Nguyen, analyst at Gartner. “We're not talking about anything mature yet or anything last, we're talking about the different techniques and architecture being built to help AI better understand, predict and respond to the world.
A world model, like the one released by Nvidia, will help robots understand how to better interact with the world, Nguyen added.
This helps robots deal with the need to understand gravity, mass, velocity, light, sound, and objects.
Nvidia is not the only vendor working in this market. On Tuesday, AI Research Vendor AI2 released a new class of models called Action Inference Models (ARMs) so that robots and machines can overcome some of the challenges and limitations of inferring using language or vision language models. The first arm is called Molmoact, built on top of AI2's Molmo, an open source family of visual language models. Molmoact is the gap between language and behavior, AI2 said. It helps robots and machines follow instructions.
The challenges with physical AI
The complexity of physical AI technology training requires specific models like those offered by NVIDIA and AI2.
“The types of software that powers humanoids are extremely complicated,” says Ray Wang, an analyst at Futurum Group. “To train humanoids, we need a model specifically designed for physical AI workloads.”
He said that humanoids need to process images and objects.
Not only does Nvidia provide technology for developers to create physical AI applications, it also offers a growing AI technology ecosystem, Wang added.
He added that Nvidia's software and hardware technology is trusted and gives vendors a great advantage, but there is still more to do to make physical AI technology commercially viable for customers.
“We haven't matured yet, but we can see that this software development has evolved rapidly over the past two or three years,” Wang says.
In related news, ANSYS, part of Synopsys' engineering simulation software, provides access to Nvidia Omniverse technology within the software.
Esther Shittu is an Informa TechTarget News Writer and Podcast host that covers artificial intelligence software and systems.