Humans can still beat AI at video games

Ask someone to chart the advances in artificial intelligence (AI) models over the past few decades, and you’ll likely hear them mention how good they are at playing games. IBM shocked the world in 1997 when its Deep Blue model defeated chess grandmaster Garry Kasparov in his own realm. Almost 20 years later, Google’s AlphaGo model defeated a human champion in the game of Go, a feat that seemed impossible at the time.

Since then, data-rich AI models have graduated from board games to video games. Various models have used a training technique called reinforcement learning (which also plays a key role in training AI chatbots such as ChatGPT) to teach machines how to learn and outperform humans in various Atari games. More recently, reinforcement learning has taught machines how to master incredibly complex strategy games like Dota 2 and Starcraft II.

But there remains one area of gaming where computers still fall short of real humans, at least for now. They are not yet good at quickly learning different types of freer games. When it comes to picking up a random title from a game store you’ve never seen before and getting the gist of it, human gamers learn the ropes much faster than the most advanced AI models.

That’s the key argument made in a recent paper by New York University computer science professor Julian Togelius and his colleagues. They point out that this distinction isn’t just a pat on the back for people. homo sapiens. It may also reveal key elements of why human intelligence is so unique, and why AI still has a long way to go before it can truly claim human-level intelligence, let alone surpass it.

“If I play against LLM [large language model] “If you play against a game you have never seen before, the result will almost certainly be failure,” the authors write.

AI was obsessed with the game from the beginning.

Games have served as useful testbeds for AI models for decades because they typically have predictable rules, defined goals, and a variety of mechanics. These fundamental principles are particularly well tracked in reinforcement learning. Reinforcement learning involves playing a game in a simulation over and over again (sometimes millions of times), gradually improving through trial and error until the model reaches proficiency. In a fundamental sense, this is how DeepMind was able to master Atari games in 2015. The same logic influences the large-scale language models popular today, even though the entire Internet serves as training data.

Still, problems arise when asked to generalize this method. AI models outperform humans in board games and certain video games because their constraints are clear and their goals are relatively simple. In the end, Togelius and his colleagues argue that while these models may look impressive, they’re still very good at very specific tasks, and nothing more. Even a small change in the overall design of the game can cause the entire game to fall apart. Models may be superhuman when playing certain games, but they prove to be quite incompetent when asked to improvise.

The difference becomes even clearer when you consider the broader trend in modern gaming towards more open-ended and abstract titles. Consider a high-budget third-person adventure game like chess and open-world western Red Dead Redemption. Both are games in a fundamental sense, but success or victory means very different things in each. Red Dead Redemption has many missions with clearly defined solutions, such as shooting the bad guys or stealing horses. However, the overarching goal of the game is not so simple. What does that mean? win When the central motive is to embody the morally questionable Western outlaw?

Human gamers can intuitively understand that. Machines, not so much. Researchers point out that even in a simple game like Minecraft, an AI model may know to jump from one block to another, but may have no idea what it actually means to jump.

“In short, all well-designed games are expertly tuned to human abilities, intuition, and common sense,” the authors write.

When playing against machines, lived experience seems to be the biggest advantage. The average gamer who downloads a new release may not have received extensive training in an office full of highly paid engineers wearing Patagonia clothes, but they have spent years manipulating and understanding the objects and more abstract concepts they will encounter in-game. The authors point out that somewhere between 18 and 24 months old, human babies learn to recognize and identify individual objects simply by existing in the world. Machines need more human hands.

All of this leads to humans learning new games faster. Previous research has shown that a gameplay AI model using curiosity-based reinforcement learning can require 4 million keyboard presses to exit a game. This equates to approximately 37 hours of continuous play. In contrast, the average human gamer typically understands even completely new mechanics within 10 hours.

That said, gameplay AI definitely continues to evolve, even in more general settings. Just last year, Google DeepMind announced a model called SIMA 2. The company describes this as a significant step forward in learning AI to play 3D games in a more human-like manner, including games for which it has not been specifically trained. Key advances included taking existing models and integrating inference capabilities from Google’s Gemini large-scale language model. This combination allowed them to better understand and interact with their new environment.

Togelius and his colleagues say these models still need to be covered in practice before they can be considered equivalent to human gamers. Their proposed benchmark involves taking a model and having it play and win the top 100 games on Steam or the iOS App Store without any prior training. It does this in about the same amount of time it would take a human. That’s a tall order.

“Playing video games in general is a formidable challenge in the sense that playing the top 100 games on Steam or the iOS App Store requires the same amount of play time as a human would require, and we are far from solving it or even seriously attempting it,” the authors write. “It’s not at all clear that current methods and models are suitable for this problem.”

Overcoming this challenge is not only of interest to the gaming world. Togelius argues that a machine capable of such generalization would need to be good at true creativity, forward planning, and abstract thinking, all qualities that would feel much more human-like than those possessed by current AI models.

In other words, the true test of how far an AI can reach “human-level intelligence” may not come from generating deepfakes or writing a run-of-the-mill novel, but from playing a bunch of games.