Mounting “World Models”, old ideas of AI, comebacks

The latest ambitions in artificial intelligence research are what are called the world model, especially in the lab seeking “artificial general information” or AGI. It is an expression of an environment in which AI carries inside itself like a snow glove of calculations. AI systems can use this simplified representation to evaluate predictions and decisions before they are applied to real-world tasks. Deep learning celebrities Yann Lecun (Meta), Demis Hassabis (Google Deepmind), and Yoshua Bengio (MILA, QUEBEC Institute of Artificial Intelligence) all believe that world models are essential to truly build AI systems Smart, Scientific and safety.

The fields of psychology, robotics and machine learning have each used versions of concepts for decades. You probably have a world model running inside your skull – it's know you don't step into front of a mobile train without having to run the experiment first.

Does this mean that AI researchers have finally found it? Core Concept Who can everyone agree with? As a famous physicist I wrote it once: Certainly you'I'm kidding. The world model may sound simple, but as usual No one can agree with the details. What is represented in the model, and what level of fidelity? Is it natural, learned, or a combination of both? And how do you detect it?'s absolutely there?

It helps you know where the whole idea started. In 1943, decades before the term “artificial intelligence” was coined, published by a 29-year-old Scottish psychologist named Kenneth Craick. An influential monograph He thought, “If a living thing has a 'small model' of external reality, in that mind, you can try out different alternatives, conclude which is best and respond in a more fulfilling, safer, more capable way. Craick'The concept of mental models or simulation isCognitive revolution“It changed psychology in the 1950s and still dominates cognitive science today. It also directly links cognition and calculation. Craik considered the “power to parallel or model external events” as the “basic feature” of both “neuromachine” and “computer.”

The new field of artificial intelligence has eagerly adopted a global modeling approach. In the late 1960s, it was called an AI system. Shrdlu Using the rudimentary “block world” we were able to “Can pyramids support blocks?” However, these handmade models were unable to scale up to handle the more realistic configuration complexity. By the late 1980s, pioneering AI and robotics, Rodney Brooks had given up on the world model completely, claiming that “the world is its own model” and “explicit expression… simply get in the way.”

The increase in deep learning based on machine learning, particularly artificial neural networks, has brought Crake's brainchild to life. Deep neural networks can construct internal approximations of training environments instead of relying on rules coded by fragile hands Due to trial and error And use them to accomplish narrowly specified tasks, Driving a virtual race car, etc.. Over the past few years, large language models behind chatbots like ChatGpt have begun demonstrations. Emergency functions They were not explicitly trained – something like guessing the title of a movie from the emoji strings, or Play the board game Othello – The World Model provided a handy explanation of the mystery. That was clear to notable AI experts such as Geoffrey Hinton, Ilya Sutskever and Chris Olah. Buried deep inside the bushes of virtual neurons in LLM, it must be “a small model of external reality,” as Crake imagined.

The truth is not so impressive, at least as far as we know. Instead of the world model, today's generative AI can estimate responses to a particular scenario: but do not share them consistently throughout. (Some people may actually contradict each other.) It is very similar to the parable of the blind man and the elephant. There, each man touches only a portion of the animal at one time, unable to understand its full form. One man feels the trunk and assumes that the entire elephant is like a snake. Another person touches the leg and speculates it is more like a tree. The third grabs the elephant's tail and says it's a rope. The researcher Try it To recover the evidence of the world model from within LLM, we are looking for an entire elephant, for example, a consistent computational representation of the Othello game board. Instead, what they found is a little snake here, a chunk of wood there, and a few ropes.

Of course, such heuristics are of little value. LLMS can code their modest bags within trillions of parameters. Like an old saw, the quantity has its own quality. That is what allows you to train your language model to generate an almost perfect orientation between any two points in Manhattan, as a researcher at Harvard University and the Massachusetts Institute of Technology, without learning a consistent world model across the street network in the process. Recently discovered.

So, if snakes, trees, and rope fragments can do their job, why would they bother the elephant? In a nutshell, Robustness: Its performance was cratered when researchers threw a mild curveball by randomly blocking 1% of the street. If AI simply encoded street maps where its details are consistent instead of the extremely complex and nook and cranny patchwork, instead of the best confronting speculation per corner patchwork, it could have been easier to re-rout around the jamming.

Given the benefits that even simple world models can be awarded, it's easy to understand why all large AI labs are eager to develop them, and why academic researchers are increasingly interested I'll examine themtoo. If not AGI's Eldorado, a robust and verifiable world model could at least reveal that it is a scientifically plausible tool to eliminate AI hallucinations, enable reliable inference, and increase the interpretability of AI systems.

That is the world model “what” and “why”. However, “how” is still available to anyone's speculation. Google Deepmind and Openai are betting that world models will spontaneously solidify within the statistical soup of neural networks using ample “multimodal” training data, including video, 3D simulations, and other inputs beyond just text. Meanwhile, Meta's Lecun believes that a whole new (and non-generic) AI architecture will provide the scaffolding that is needed. In the quest to build these computational snow spheres, no one has a crystal ball, but the prize may only be worth the Agi hype once.

Source link