image:
Nim game state diagram. The left panel shows the initial board configuration of five heaps. [n1, n2, n3, n4, n5] = [1, 3, 5, 7, 9]. The center panel shows the state of the intermediate board during gameplay. [v1, v2, v3, v4, v5] = [1, 2, 4, 4, 3]the result of a player removing a counter. The right panel represents the final state with all counters cleared, meaning victory for the player who made the last move.
view more
Credit: Image by Dr. Bei Zhou, Research Fellow, Imperial College, London, and Dr. Søren Riis Reader, Computer Science, Queen Mary University of London.
- Researchers tested AlphaZero-style self-play with Nim, a mathematically solved children’s game.
- Despite rigorous training, agents have blind spots and can miss optimal moves.
- Good performance does not necessarily mean that the system has learned the underlying winning formula.
Embargo: Immediate.
New research published inmachine learningWe show that pattern learning alone is not enough to train AI to work on games, and that abstract representations and hybrid approaches can be helpful.
Many AI researchers describe gameplay as the “Formula 1” of AI. This is a controlled testing environment with clear rules and clear success criteria. In this paper, we use that idea as a diagnostic to study a matchstick game for children called Nim, a very simple game in which the optimal strategy is precisely known.
Since the correct move is known for every position, we can measure whether the agent is playing optimally across its state space. The study found that despite intense training and exploration, while small boards work, the agent has blind spots that can cause it to miss optimal moves, and as the board grows, performance decreases and predictions become more random. This suggests that fair games often require analytical representation rather than pattern learning.
What does this mean for machine-based games?
Self-playing AI is very powerful, but in games where both players share “pieces” and winning strategies are abstract arithmetic rules, pattern recognition from raw positions may not be enough.
Wider impact:
This result does not undermine the achievements of self-playing AI in games such as Chess and Go. Rather, they help map where today’s methods may have difficulties and where more abstract representations or hybrid approaches may be beneficial. More broadly, this is a reminder that while the system performs well in the common case, it remains vulnerable in rare but important cases.
Dr Soren Rees-Leeder, Professor of Computer Science at Queen Mary University of London, said: “Although Nim is a children’s game with full mathematical solutions, AlphaZero-style self-play can still create blind spots, making you competitive while missing the best moves in many positions.”
“This suggests that in future AI research, good performance alone is not proof that the system has learned the underlying principles. Reducing blind spots may require methods that capture abstract structure.”
—
“Impartial Games: A Challenge for Reinforcement Learning” by Dr. Bei Zhou, a researcher at Imperial College London, and Dr. Soren Rees-Leeder, a computer science professor at Queen Mary University of London, has been published in Machine Learning magazine.
Research method
experimental research
Research theme
people
Article title
A fair game: The challenge of reinforcement learning
Article publication date
March 13, 2026
Conflict of interest statement
No conflict of interest
Disclaimer: AAAS and EurekAlert! We are not responsible for the accuracy of news releases posted on EurekAlert! Use of Information by Contributing Institutions or via the EurekAlert System.
