DeepMind Achieves Breakthrough in Mathematical Problem Solving — AI's Next Big Challenge

Close-up of a thematic stamp sheet "57th International Mathematical Olympiad 2016" — The questions in the International Mathematical Olympiad come from a variety of fields in mathematics.Credit: David Wong/South China Morning Post via Getty

Google's DeepMind, which has beaten humans at everything from Go to strategy board games, now says it is on the cusp of beating the world's best students at solving math problems.

The London-based machine learning company announced on July 25 that its artificial intelligence (AI) system solved four of the six problems posed to students at the 2024 International Mathematical Olympiad (IMO), held this month in Bath, England. The AI created a rigorous step-by-step proof, which was graded by two top mathematicians and earned it a score of 28/42, just one point shy of the gold medal range.

DeepMind AI Solves Geometry Problems at Student Level

“This is clearly a huge advance,” said Cambridge, UK-based mathematician Joseph Myers, who reviewed the solutions with Fields Medal winner Tim Gowers and helped choose this year's IMO's original questions.

DeepMind and other companies are racing to give machines proofs that will ultimately solve big mathematical problems. The problems posed in the IMO, the world's premier competition for young mathematicians, serve as a benchmark for progress toward that goal and have come to be considered “grand challenges” for machine learning, the company said.

“This is the first time an AI system has been able to achieve medal-level performance,” Pushmeet Kohli, DeepMind's vice president of AI science, said at a press conference. “This is a significant milestone on the journey to building an advanced theorem prover,” Kohli said.

Branching

Just a few months ago, in January, DeepMind's system, AlphaGeometry, achieved medalist-level performance in solving a type of IMO problem called a Euclidean geometry problem. The first AI to achieve gold-medal-level performance in a comprehensive test that includes problems generally considered to be harder than geometry, such as algebra, combinatorics, and number theory, will be eligible to win a $5 million prize called the AI Mathematics Olympiad (AIMO) Prize. (The award has strict criteria, including having to open-source the code and operate with limited computing power, meaning DeepMind's current effort is ineligible.)

In their latest effort, the researchers used AlphaGeometry2 to solve the geometry problem in under 20 seconds, an AI that is an improved and faster version of their record-setting system, says DeepMind computer scientist Thang Luong.

For other types of problems, the team developed an entirely new system called AlphaProof, which solved the competition's two algebra problems and one number theory problem over three days (actual IMO participants are given two 4.5-hour sessions), but couldn't solve two problems in combinatorics, another branch of mathematics.

A close-up of the gold medal won by a Romanian participant at the 63rd International Mathematical Olympiad. — The Mathematics Olympiad is the world's premier competition for school-age mathematical geniuses.Credit: MoiraM/Alamy

Researchers have tried using language models, the systems that power chatbots such as ChatGPT, to answer mathematical questions, but have had mixed results: While the models give the right answer, they sometimes can't rationally explain why, and sometimes spit out answers that don't make sense.

Last week, a team of researchers from software companies Numina and HuggingFace used a language model to win the Intermediate AIMO “Advancement Award” for their work on a simplified version of the IMO problem. The companies have open-sourced their entire system, making it available for other researchers to download. However, the winners said: Nature To solve more difficult problems, language models alone will likely not be enough.

A-class solver

AlphaProof combines language modeling and reinforcement learning techniques, using its “AlphaZero” engine, which the company has used successfully to solve games such as Go and certain math problems. In reinforcement learning, neural networks learn by trial and error, which is useful when the answer can be evaluated by some objective criteria. To this end, AlphaProof was trained to read and write proofs in a formal language called Lean, which is used by a popular mathematician “proof assistant” software package of the same name. To do this, AlphaProof tested whether its output was correct by running it through the Lean package, which helps fill in some steps in the code.

Training a language model requires huge amounts of data, but Lean had few mathematical proofs available. To solve this problem, the team designed an additional network that tried to translate existing records of 1 million problems written in natural language into Lean, but without human-written answers, says Thomas Hubert, a machine learning researcher at DeepMind who co-led the development of AlphaProof. “Our approach was, can we learn proofs without having to train on proofs written by humans?” (The company has taken a similar approach with Go, where the AI learned how to play the game by playing against itself, rather than the way humans do.)

Magic Key

While many of Lean's translations were gibberish, some were good enough to get AlphaProof to a level where it could begin its reinforcement learning cycle. The results far exceeded expectations, Gowers said at a press conference. “A lot of IMO problems have this magic key property. The problem seems hard at first until you find the magic key and solve it,” said Gowers, who is at the Collège de France in Paris.

In some cases, AlphaProof appears to have been able to take creativity a step further, providing the right step out of an infinite number of possibilities. But Gowers added that further analysis must confirm whether the answer was not as surprising as it seemed. A similar debate erupted after DeepMind's AlphaGo bot played the astonishing “37th move” in its famous match against the world's top Go player in 2016, which was a watershed moment for AI.

It remains to be seen whether the technique is mature enough to do research-level studies in mathematics, Myers said at the press conference. “Could it be applied to other kinds of math, where there might not be a million problems to train on?”

“We're at the point where we can prove that, although it's not an unsolved research problem, it's very hard, at least for the best young mathematicians in the world,” said David Silver, a computer scientist at DeepMind who was the lead researcher on AlphaGo in the mid-2010s.

Source link