You may have seen it before. A humanoid robot moonwalks across the stage to Michael Jackson’s “Billie Jean,” then repeatedly slides down a series of steps and lies motionless. The video, shot in Shenzhen, China, took social media by storm this week.
This scene may foreshadow a future in which robots imitate humans, act alongside humans, and work together. That’s because physical AI is an early example of what many experts are calling the next phase of the artificial intelligence boom.
What is physical AI?
The term “physical AI” is believed to be coined by NVIDIA CEO Jensen Huang to refer to the evolution of AI from digital screens to the real world. In a recent company blog post, Huang suggested that “the ChatGPT moment in general robotics is just around the corner.”
According to Yanzhi Wang, professor of electrical and computer engineering, physical AI refers to AI systems designed to interact with the environment, usually through specialized sensors.
Examples of physical AI systems extend beyond robotics and include medical devices, self-driving cars, smart manufacturing systems, AI-powered drones, and more.





What does it mean for an AI to interact with its environment?
Experts describe that interaction in terms of the system’s ability to “perceive, reason, and learn” from its surrounding environment. These AI systems will be able to learn the laws of physics with some degree of autonomy and adaptability.
Sarah Ostadavas, associate professor of electrical and computer engineering, said physical AI systems should theoretically be able to not only “sense and learn from the world” but also “act independently” based on information they take in from their surroundings.
However, bridging the gap between the simulation or virtual world and the real world requires “the following. reason “About what you saw or perceived. So this element of reasoning is really important,” she added.


Ostadavas explained that inference models rely heavily on understanding text, or language. Language-based systems make inferences from descriptions and patterns rather than direct interaction with the physical world. “We hope that this inference component of these systems will ultimately come from the actual physics of the world,” she said.
At Northeastern University’s Physics AI Research Initiative (PAIR), Ostadavas and his colleagues are establishing a framework to guide the development of physical AI systems.
One of the emerging templates for physical AI systems is the “visual-verbal-action” model (VLA), Wang said. The visual-verbal-behavioral model describes a system that integrates visual perception and language processing to act and make decisions. Early models, such as NVIDIA’s GR00T N1 and Google DeepMind’s RT-1, were designed to help robots interpret their surroundings.
What are some applications of Physical AI?
Physical AI is already being deployed in several sectors, including manufacturing, Wang said. The best-known examples include robotic arms that assemble products on factory lines and autonomous warehouse robots that help transport inventory, sort packages, and perform other mechanical tasks with minimal human intervention.
Unlike traditional industrial robots, which are typically programmed to repeat the same fixed movements, physical AI systems are designed to adapt to changing conditions, identify objects, and move independently through space.
Wang said physical AI systems have the potential to transform manufacturing and other industries by enabling machines to operate, learn and adapt in unpredictable environments.
How far does physical AI need to evolve?
Physical AI is still largely theoretical in nature. Ostadavas said there are many hurdles to achieving the kinds of physical AI systems she and her colleagues are trying to define and conceptualize.
One of the challenges, she said, is the dynamic and unpredictable nature of the real world. The visual and physical data that these systems rely on is often “unclean” or “dirty,” which refers to how the environment changes and includes obstacles and other unexpected variables.
Safety is also another concern. Ostadavas said physical AI systems operating around humans must avoid causing harm or jeopardizing trust, which raises a number of technical and legal issues.
China’s dancing robots seemed harmless enough. However, in other situations, such systems
“How do we make sure this action is safe, reliable, verifiable, and robust?” Ostadavas said. “This is the final pillar of our framework.”
Wang believes that if innovation and development continue at the current pace, large-scale adoption could occur soon.
“I think it will become more mainstream, but we still have a long way to go,” Wang said. “But…given the current advancements in this generation of tools, we will probably see significant advances in the next two to three years.”

