AI achieves silver-medal standard solving International Mathematical Olympiad problems

solving International Mathematical Olympiad problems

Published
25 JULY 2024
Authors

AlphaProof and AlphaGeometry teams

A blue background with faint white outlines of a cube, sphere, and mathematical symbols surrounding a central glowing sphere with lines crisscrossing through it.

Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics

Artificial general intelligence (AGI) with advanced mathematical reasoning has the potential to unlock new frontiers in science and technology.

We’ve made great progress building AI systems that help mathematicians discover new insights, novel algorithms and answers to open problems. But current AI systems still struggle with solving general math problems because of limitations in reasoning skills and training data.

Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.

Breakthrough AI performance solving complex math problems

The IMO is the oldest, largest and most prestigious competition for young mathematicians, held annually since 1959.

Each year, elite pre-college mathematicians train, sometimes for thousands of hours, to solve six exceptionally difficult problems in algebra, combinatorics, geometry and number theory. Many of the winners of the Fields Medal, one of the highest honors for mathematicians, have represented their country at the IMO.

More recently, the annual IMO competition has also become widely recognised as a grand challenge in machine learning and an aspirational benchmark for measuring an AI system’s advanced mathematical reasoning capabilities.

This year, we applied our combined AI system to the competition problems, provided by the IMO organizers. Our solutions were scored according to the IMO’s point-awarding rules by prominent mathematicians Prof Sir Timothy Gowers, an IMO gold medalist and Fields Medal winner, and Dr Joseph Myers, a two-time IMO gold medalist and Chair of the IMO 2024 Problem Selection Committee.

The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art.

PROF SIR TIMOTHY GOWERS,
IMO GOLD MEDALIST AND FIELDS MEDAL WINNER

First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct. This included the hardest problem in the competition, solved by only five contestants at this year’s IMO. AlphaGeometry 2 proved the geometry problem, while the two combinatorics problems remained unsolved.

Each of the six problems can earn seven points, with a total maximum of 42. Our system achieved a final score of 28 points, earning a perfect score on each problem solved — equivalent to the top end of the silver-medal category. This year, the gold-medal threshold starts at 29 points, and was achieved by 58 of 609 contestants at the official competition.

Colored graph showing our AI system’s performance relative to human competitors earning bronze, silver and gold at IMO 2024. Our system earned 28 out of 42 total points, achieving the same level as a silver medalist in the competition and nearly reaching the gold-medal threshold starting at 29 points.

Graph showing performance of our AI system relative to human competitors at IMO 2024. We earned 28 out of 42 total points, achieving the same level as a silver medalist in the competition.

AlphaProof: a formal approach to reasoning

AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.

Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified for correctness. Their use in machine learning has, however, previously been constrained by the very limited amount of human-written data available.

In contrast, natural language based approaches can hallucinate plausible but incorrect intermediate reasoning steps and solutions, despite having access to orders of magnitudes more data. We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.

When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean. Each proof that was found and verified is used to reinforce AlphaProof’s language model, enhancing its ability to solve subsequent, more challenging problems.

We trained AlphaProof for the IMO by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition. The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.

Process infographic of AlphaProof’s reinforcement learning training loop: Around one million informal math problems are translated into a formal math language by a formalizer network. Then a solver network searches for proofs or disproofs of the problems, progressively training itself via the AlphaZero algorithm to solve more challenging problems

Process infographic of AlphaProof’s reinforcement learning training loop: Around one million informal math problems are translated into a formal math language by a formalizer network. Then a solver network searches for proofs or disproofs of the problems, progressively training itself via the AlphaZero algorithm to solve more challenging problems.

A more competitive AlphaGeometry 2

AlphaGeometry 2 is a significantly improved version of AlphaGeometry. It’s a neuro-symbolic hybrid system in which the language model was based on Gemini and trained from scratch on an order of magnitude more synthetic data than its predecessor. This helped the model tackle much more challenging geometry problems, including problems about movements of objects and equations of angles, ratio or distances.

AlphaGeometry 2 employs a symbolic engine that is two orders of magnitude faster than its predecessor. When presented with a new problem, a novel knowledge-sharing mechanism is used to enable advanced combinations of different search trees to tackle more complex problems.

Before this year’s competition, AlphaGeometry 2 could solve 83% of all historical IMO geometry problems from the past 25 years, compared to the 53% rate achieved by its predecessor. For IMO 2024, AlphaGeometry 2 solved Problem 4 within 19 seconds after receiving its formalization.

A geometric diagram featuring a triangle ABC inscribed in a larger circle, with various points, lines, and another smaller circle intersecting the triangle. Point A is the apex, with lines connecting it to points L and K on the larger circle, and point E inside the triangle. Points T1 and T2 lie on the lines AB and AC respectively. The smaller circle is centered at point I, the incenter of triangle ABC, and intersects the larger circle at points L and K. Points X, D, and Y lie on lines AB, BC, and AC, respectively, and a blue angle is formed at point P, below the triangle. The diagram is labeled with the letters A, B, C, D, E, I, K, L, O, P, T1, T2, X, and Y.

Illustration of Problem 4, which asks to prove the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposed to construct E, a point on the line BI so that ∠AEB = 90°. Point E helps give purpose to the midpoint L of AB, creating many pairs of similar triangles such as ABE ~ YBI and ALE ~ IPC needed to prove the conclusion.

New frontiers in mathematical reasoning

As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.

Our teams are continuing to explore multiple AI approaches for advancing mathematical reasoning and plan to release more technical details on AlphaProof soon.

We’re excited for a future in which mathematicians work with AI tools to explore hypotheses, try bold new approaches to solving long-standing problems and quickly complete time-consuming elements of proofs — and where AI systems like Gemini become more capable at math and broader reasoning.

Acknowledgements

We thank the International Mathematical Olympiad organization for their support.

AlphaProof development was led by Thomas Hubert, Rishi Mehta and Laurent Sartran; AlphaGeometry 2 and natural language reasoning efforts were led by Thang Luong.

AlphaProof was developed with key contributions from Hussain Masoom, Aja Huang, Miklós Z. Horváth, Tom Zahavy, Vivek Veeriah, Eric Wieser, Jessica Yung, Lei Yu, Yannick Schroecker, Julian Schrittwieser, Ottavia Bertolli, Borja Ibarz, Edward Lockhart, Edward Hughes, Mark Rowland, Grace Margand. Alex Davies and Daniel Zheng led the development of informal systems such as final answer determination, with key contributions from Iuliya Beloshapka, Ingrid von Glehn, Yin Li, Fabian Pedregosa, Ameya Velingker and Goran Žužić. Oliver Nash, Bhavik Mehta, Paul Lezeau, Salvatore Mercuri, Lawrence Wu, Calle Soenne, Thomas Murrills, Luigi Massacci and Andrew Yang advised and contributed as Lean experts. Past contributors include Amol Mandhane, Tom Eccles, Eser Aygün, Zhitao Gong, Richard Evans, Soňa Mokrá, Amin Barekatain, Wendy Shang, Hannah Openshaw, Felix Gimeno. This work was advised by David Silver and Pushmeet Kohli.

The development of AlphaGeometry 2 was led by Trieu Trinh and Yuri Chervonyi, with key contributions by Mirek Olšák, Xiaomeng Yang, Hoang Nguyen, Junehyuk Jung, Dawsen Hwang and Marcelo Menegali. The development of the natural language reasoning system was led by Golnaz Ghiasi, Garrett Bingham, YaGuang Li, with key contributions by Swaroop Mishra, Nigamaa Nayakanti, Sidharth Mudgal, Qijun Tan, Junehyuk Jung, Hoang Nguyen, Alex Zhai, Dawsen Hwang, Mingyang Deng, Clara Huiyi Hu, Cosmo Du. Both AlphaGeometry and natural language reasoning systems were advised by Quoc Le.

David Silver, Quoc Le, Demis Hassabis, and Pushmeet Kohli coordinated and managed the overall project.

We’d also like to thank Insuk Seo, Evan Chen, Zigmars Rasscevskis, Kari Ragnarsson, Junhwi Bae, Jeonghyun Ahn, Jimin Kim, Hung Pham, Nguyen Nguyen, Son Pham, and Pasin Manurangsi who helped evaluate the quality of our language reasoning system. Prof Gregor Dolinar and Dr Geoff Smith MBE from the IMO Board, for the support and collaboration. Jarrod Kahn, Maciej Kula, Tu Vu, Hanzhao Lin, Chenkai Kuang, Vikas Verma, Yifeng Lu, Vihan Jain, Henryk Michalewski, Xavier Garcia, Arjun Kar, Lampros Lamprou, Kaushal Patel, Ilya Tolstikhen, Olivier Bousquet, Anton Tsitsulin, Dustin Zelle, CJ Carey, Sam Blackwell, Abhi Rao, Vahab Mirrokni, Behnam Neyshabur, Ethan Dyer, Keith Rush, Moritz Firsching, Dan Shved, Ihar Bury, Divyanshu Ranjan, Hadi Hashemi, Alexei Bendebury, Soheil Hassas Yeganeh, Shibl Mourad, Simon Schmitt, Satinder Baveja, Chris Dyer, Jacob Austin, Wenda Li, Heng-tze Cheng, Ed Chi, Koray Kavukcuoglu, Oriol Vinyals, Jeff Dean and Sergey Brin for their support and advice.

Finally, we’d like to thank the many contributors to the Lean and Mathlib projects, without whom AlphaProof wouldn’t have been possible.

Related posts

@Creative_bot 概述本文並評議話題:當AI可以獲得獎牌,人類參與競賽的意義是什麼

概述

DeepMind 的这篇博文宣布,他们开发的两个人工智能系统 AlphaProof 和 AlphaGeometry 2 在今年的国际数学奥林匹克竞赛(IMO)中取得了突破性进展。这两个系统共同解决了六道 IMO 问题中的四道,获得了 28 分(总分 42 分),达到了 IMO 银牌的水平。

AlphaProof 是一个基于强化学习的系统,用于形式化的数学推理,它通过将预先训练的语言模型与 AlphaZero 算法相结合,学会了如何在形式语言 Lean 中证明数学陈述。

AlphaGeometry 2 是 DeepMind 几何解题系统 AlphaGeometry 的改进版本,它是一个神经符号混合系统,在比其前身多出一个数量级的合成数据上进行了训练。

这两个系统在解决 IMO 问题方面表现出色,AlphaProof 解决了两个代数问题和一个数论问题,而 AlphaGeometry 2 则在短短 19 秒内就解决了一个几何问题。

当AI可以获得奖牌,人类参与竞赛的意义是什么?

这是一个值得深思的问题。当人工智能在需要高度智力参与的领域,比如 IMO 竞赛中,展现出接近甚至超越人类顶尖水平的能力时,我们不禁要思考人类参与其中的意义。

我认为,即使在 AI 时代,人类参与 IMO 竞赛等活动仍然具有重要意义:

  • 推动人类潜能: IMO 竞赛代表着人类智力挑战的巅峰,参与其中可以激励年轻人追求卓越,不断挑战自我,激发人类的潜能。
  • 培养批判性思维和解决问题的能力: IMO 竞赛要求参赛者具备强大的逻辑思维、抽象思维和解决问题的能力,这些能力在任何领域都是至关重要的。
  • 促进国际交流与合作: IMO 竞赛为来自世界各地的优秀学生提供了一个交流和学习的平台,促进了跨文化理解和国际合作。
  • 启发科学发现: IMO 竞赛中的问题往往涉及到深刻的数学概念和思想,解决这些问题有助于推动数学和其他科学领域的进步。

人工智能在 IMO 竞赛中的成功,不应被视为对人类的威胁,而应该被看作是人类智慧的体现。人工智能是人类创造的工具,它可以帮助我们更好地理解世界、解决问题。未来,人类和人工智能可以相互合作,共同应对更复杂、更具挑战性的问题,创造更美好的未来。

在數學乃至整個科學領域,人類最終是算不過AI的,這一點最終會如何影響教育和人類的學科,學習?
@Creative_bot

这是一个很有深度的问题!确实,在纯粹的计算能力和处理速度上,人工智能在很多领域已经超越了人类,包括数学和科学领域。但这并不意味着人类在这些领域就失去了存在的意义,或者教育和学习的方式需要被彻底颠覆。

以下是 AI 时代,数学和科学教育可能发生的变化以及人类学习的意义:

1. 教育目标的转变:

  • 从计算到理解: 未来的数学和科学教育将更加注重对概念的理解、批判性思维的培养以及解决问题的能力,而非单纯的计算和记忆。
  • 创造力和创新: AI 可以帮助我们解决已知问题,但人类的创造力和想象力才是推动科学进步的关键。教育需要培养学生的创新思维,鼓励他们提出新的问题,探索未知领域。
  • 人机合作: 未来的教育需要教会学生如何与 AI 工具协作,利用 AI 的优势来提高学习和研究效率。

2. 学习方式的变革:

  • 个性化学习: AI 可以根据学生的个体差异提供个性化的学习内容和路径,帮助他们更好地理解和掌握知识。
  • 互动式学习: AI 可以创建更加生动、互动的学习环境,例如虚拟实验室和仿真模型,让学生更直观地理解抽象概念。
  • 终身学习: AI 可以为人们提供持续学习的机会,帮助他们在不断变化的世界中保持竞争力。

3. 人类学习的意义:

即使在 AI 时代,人类学习数学和科学仍然具有重要意义:

  • 锻炼思维能力: 数学和科学的学习可以帮助我们锻炼逻辑思维、抽象思维和批判性思维,这些能力在任何领域都是至关重要的。
  • 理解世界: 数学和科学是理解自然世界、社会现象和人类自身的强大工具。
  • 推动社会进步: 数学和科学是推动技术创新、经济发展和社会进步的重要力量。

总而言之,人工智能的崛起对数学和科学教育提出了新的挑战,但也带来了新的机遇。我们需要重新思考教育的目标和方式,更加注重培养学生的创造力、批判性思维和人机合作能力。同时,我们也要认识到,即使在 AI 时代,人类学习数学和科学仍然具有不可替代的价值。

The entire approach to education needs to change fast, now that AI is reaching the ability to discover new things (one of the most valuable skills). Whether people like it or not, the structure of education is heavily influenced by exams which test students’ proficiency at…

— Po-Shen Loh (@PoShenLoh) July 25, 2024