Seed-Prover:深度與廣度推理的自動化定理證明
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
July 31, 2025
作者: Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao, Yichi Zhou, Thomas Hanwen Zhu
cs.AI
摘要
大型語言模型(LLMs)通過利用長鏈思維的強化學習,展現了強大的數學推理能力,然而由於僅使用自然語言時缺乏明確的監督信號,它們在定理證明方面仍面臨挑戰。專用的領域特定語言,如Lean,通過形式化驗證證明提供了清晰的監督,從而實現了有效的強化學習訓練。在本研究中,我們提出了Seed-Prover,一種基於引理的全證明推理模型。Seed-Prover能夠根據Lean的反饋、已證明的引理以及自我總結,迭代地精煉其證明。為了解決國際數學奧林匹克(IMO)級別的競賽問題,我們設計了三種測試時推理策略,實現了深度與廣度兼備的推理。Seed-Prover證明了78.1%的形式化歷史IMO問題,在MiniF2F上達到飽和,並在PutnamBench上取得了超過50%的成績,大幅超越了先前的最先進水平。針對Lean中幾何支持的不足,我們引入了幾何推理引擎Seed-Geometry,其性能超越了先前的形式化幾何引擎。我們利用這兩個系統參與了2025年IMO,並完整證明了6道問題中的5道。這項工作代表了自動化數學推理的重大進展,展示了長鏈思維推理與形式化驗證的有效性。
English
LLMs have demonstrated strong mathematical reasoning abilities by leveraging
reinforcement learning with long chain-of-thought, yet they continue to
struggle with theorem proving due to the lack of clear supervision signals when
solely using natural language. Dedicated domain-specific languages like Lean
provide clear supervision via formal verification of proofs, enabling effective
training through reinforcement learning. In this work, we propose
Seed-Prover, a lemma-style whole-proof reasoning model. Seed-Prover
can iteratively refine its proof based on Lean feedback, proved lemmas, and
self-summarization. To solve IMO-level contest problems, we design three
test-time inference strategies that enable both deep and broad reasoning.
Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F,
and achieves over 50\% on PutnamBench, outperforming the previous
state-of-the-art by a large margin. To address the lack of geometry support in
Lean, we introduce a geometry reasoning engine Seed-Geometry, which
outperforms previous formal geometry engines. We use these two systems to
participate in IMO 2025 and fully prove 5 out of 6 problems. This work
represents a significant advancement in automated mathematical reasoning,
demonstrating the effectiveness of formal verification with long
chain-of-thought reasoning.