邁向通過解耦推理與證明解決更具挑戰性的國際數學奧林匹克問題
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
July 7, 2025
作者: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu
cs.AI
摘要
自動定理證明(ATP)在形式語言中是人工智慧的一個基礎性挑戰。儘管大型語言模型(LLMs)推動了顯著的進展,但其強大的非形式推理能力與薄弱的正式證明效能之間仍存在顯著差距。最近的研究顯示,在如PutnamBench等基準測試中,非形式準確率超過80%,而正式成功率卻低於8%。我們認為這一差距持續存在,是因為當前最先進的證明器通過緊密耦合推理與證明,採用了無意中懲罰深度推理、偏愛淺層策略的訓練範式。為彌合這一根本性差距,我們提出了一種新框架,將高層次推理與低層次證明生成解耦。我們的方法利用兩個獨特且專門的模型:一個強大的通用推理器來生成多樣化的戰略性子目標引理,以及一個高效的證明器來嚴格驗證這些引理。這種模組化設計釋放了模型的全部推理潛力,並避開了端到端訓練的陷阱。我們在一組具有挑戰性的2000年後國際數學奧林匹克(IMO)問題上評估了我們的方法,這是一組此前沒有任何開源證明器報告過成功的問題集。我們的解耦框架成功解決了其中的5個問題,展示了在自動化推理極其困難的數學挑戰方面邁出的重要一步。為促進未來研究,我們發布了針對廣泛IMO問題生成和驗證的引理完整數據集,網址為https://tencent-imo.github.io/。
English
Automated Theorem Proving (ATP) in formal languages is a foundational
challenge for AI. While Large Language Models (LLMs) have driven remarkable
progress, a significant gap remains between their powerful informal reasoning
capabilities and their weak formal proving performance. Recent studies show
that the informal accuracy exceeds 80% while formal success remains below 8% on
benchmarks like PutnamBench. We argue this gap persists because current
state-of-the-art provers, by tightly coupling reasoning and proving, are
trained with paradigms that inadvertently punish deep reasoning in favor of
shallow, tactic-based strategies. To bridge this fundamental gap, we propose a
novel framework that decouples high-level reasoning from low-level proof
generation. Our approach utilizes two distinct, specialized models: a powerful,
general-purpose Reasoner to generate diverse, strategic subgoal lemmas, and an
efficient Prover to rigorously verify them. This modular design liberates the
model's full reasoning potential and bypasses the pitfalls of end-to-end
training. We evaluate our method on a challenging set of post-2000 IMO
problems, a problem set on which no prior open-source prover has reported
success. Our decoupled framework successfully solves 5 of these problems,
demonstrating a significant step towards automated reasoning on exceptionally
difficult mathematical challenges. To foster future research, we release our
full dataset of generated and verified lemmas for a wide range of IMO problems,
available at https://tencent-imo.github.io/ .