ChatPaper.aiChatPaper

透過複雜度提升強化學習實現奧林匹亞級幾何大型語言模型代理

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

December 11, 2025
作者: Haiteng Zhao, Junhao Shen, Yiming Zhang, Songyang Gao, Kuikun Liu, Tianyou Ma, Fan Zheng, Dahua Lin, Wenwei Zhang, Kai Chen
cs.AI

摘要

大型語言模型(LLM)智能體展現出強大的數學問題解決能力,甚至能在形式化證明系統的輔助下解決國際數學奧林匹克(IMO)級別的難題。然而,由於輔助構造的啟發式策略較弱,幾何問題求解領域仍由專家模型(如AlphaGeometry 2)主導,這類模型在訓練與評估階段嚴重依賴大規模數據合成與搜索。本研究首次嘗試構建具備金牌水準的LLM幾何智能體,提出InternGeometry模型。該模型通過迭代式提出命題與輔助構造、使用符號引擎驗證並根據引擎反饋調整後續提案,克服了幾何領域的啟發式局限。動態記憶機制使InternGeometry能對每個問題與符號引擎進行超過兩百次交互。為加速學習,我們提出複雜度遞增強化學習(CBRL)方法,在訓練階段逐步提升合成問題的複雜度。基於InternThinker-32B構建的InternGeometry僅使用1.3萬訓練樣本(相當於AlphaGeometry 2數據量的0.004%),在2000-2024年的50道IMO幾何題中解決44題,超越金牌得主平均分(40.9分),展現了LLM智能體在專家級幾何任務上的潛力。該模型還能針對IMO題目提出人類解法中未出現的新穎輔助構造。我們將公開模型、數據與符號引擎以支持後續研究。
English
Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of formal proof systems. However, due to weak heuristics for auxiliary constructions, AI for geometry problem solving remains dominated by expert models such as AlphaGeometry 2, which rely heavily on large-scale data synthesis and search for both training and evaluation. In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry. InternGeometry overcomes the heuristic limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on the engine's feedback to guide subsequent proposals. A dynamic memory mechanism enables InternGeometry to conduct more than two hundred interactions with the symbolic engine per problem. To further accelerate learning, we introduce Complexity-Boosting Reinforcement Learning (CBRL), which gradually increases the complexity of synthesized problems across training stages. Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems (2000-2024), exceeding the average gold medalist score (40.9), using only 13K training examples, just 0.004% of the data used by AlphaGeometry 2, demonstrating the potential of LLM agents on expert-level geometry tasks. InternGeometry can also propose novel auxiliary constructions for IMO problems that do not appear in human solutions. We will release the model, data, and symbolic engine to support future research.
PDF251December 13, 2025