通过复杂度增强强化学习实现奥林匹克级别的几何大语言模型智能体
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
December 11, 2025
作者: Haiteng Zhao, Junhao Shen, Yiming Zhang, Songyang Gao, Kuikun Liu, Tianyou Ma, Fan Zheng, Dahua Lin, Wenwei Zhang, Kai Chen
cs.AI
摘要
大型语言模型(LLM)智能体展现出卓越的数学问题解决能力,甚至能在形式化证明系统的辅助下解决国际数学奥林匹克(IMO)级别的难题。然而由于辅助构造的启发式能力较弱,几何问题求解领域仍被AlphaGeometry 2等专家模型主导,这类模型严重依赖大规模数据合成及训练评估阶段的搜索策略。本研究首次尝试构建具有金牌得主水准的几何LLM智能体InternGeometry,该模型通过迭代式命题生成与辅助构造提议,结合符号引擎验证及反馈反思机制,突破了传统几何启发式方法的局限。动态记忆机制使InternGeometry能对每个问题完成超200次符号引擎交互。为进一步加速学习,我们提出复杂度递增强化学习(CBRL)方法,在训练阶段逐步提升合成问题的复杂度。基于InternThinker-32B构建的InternGeometry在2000-2024年间的50道IMO几何题中成功解答44题,以仅1.3万训练样本(相当于AlphaGeometry 2数据量的0.004%)超越金牌得主平均分(40.9分),证明了LLM智能体在专业几何任务上的潜力。该模型还能为人类解法中未出现的IMO问题提出创新性辅助构造。我们将公开模型、数据及符号引擎以支持后续研究。
English
Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of formal proof systems. However, due to weak heuristics for auxiliary constructions, AI for geometry problem solving remains dominated by expert models such as AlphaGeometry 2, which rely heavily on large-scale data synthesis and search for both training and evaluation. In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry. InternGeometry overcomes the heuristic limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on the engine's feedback to guide subsequent proposals. A dynamic memory mechanism enables InternGeometry to conduct more than two hundred interactions with the symbolic engine per problem. To further accelerate learning, we introduce Complexity-Boosting Reinforcement Learning (CBRL), which gradually increases the complexity of synthesized problems across training stages. Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems (2000-2024), exceeding the average gold medalist score (40.9), using only 13K training examples, just 0.004% of the data used by AlphaGeometry 2, demonstrating the potential of LLM agents on expert-level geometry tasks. InternGeometry can also propose novel auxiliary constructions for IMO problems that do not appear in human solutions. We will release the model, data, and symbolic engine to support future research.