June 19, 2024
Beijing – Although artificial intelligence has surpassed humans in many areas, it still faces significant limitations in the field of mathematics.
In the preliminary round of the 2024 Alibaba World Math Competition, 563 teams used AI to solve the problems. To the surprise of AI advocates, none of the teams managed to score enough points to advance to the finals.
During the 48-hour preliminary round, AI and human participants were presented with the same test questions, including multiple-choice, problem-solving and proof questions, and the AI teams were asked to submit their models in advance to avoid cheating.
According to the competition's organizing committee, the participating AI teams' average score was 18 points, on par with the average level of the human competitors, but the highest score the AI achieved was 34 points, well below the highest human score of 113 points.
Chen Tianchu, a researcher of large-scale models at the Institute of Computer Architecture at Zhejiang University, said the way LLMs currently work is to predict the next word based on the context at a certain rate and output the result at once. In tasks that require repeated trials and thoughtful thinking, such as math contests, LLMs are still limited in completing complex reasoning and rigorous thinking, according to the Economic Observatory. He added that AI still cannot replace humans who are professionally trained in mathematics.
About half of the AI team's members were born after 2000 and come from institutions such as Peking University, Tsinghua University, Oxford University, Amazon Web Services, and ByteDance.
Some have tweaked large-scale open-source models to help AI progress from elementary to advanced mathematics, while others have combined prompt engineering to access closed-source models such as GPT-4 and built AI agents that improve on GPT-4’s mathematical problem-solving abilities.
Tu Jinhao from Shanghai Jianping High School used AI to achieve the highest score. Inspired by the concept of self-discussion, Tu applied multiple large-scale models to multiple rounds of “self-questioning, self-answering, and self-verification” to find the optimal solution to the problem.
The top three AI teams received cash prizes of $10,000, $5,000, and $2,000, respectively.
According to the organizing committee, the annual event will continue to open the door to AI, encourage exploration of its potential boundaries, and promote research and innovation in its applications to mathematics.
In an interview with the Shanghai Securities News, committee member Yin Wotao said this was a positive attempt to break through the limitations of AI capabilities and bring about more possibilities.
