ChatPaper.aiChatPaper

Seed-Prover:深度与广度兼备的自动定理证明推理系统

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

July 31, 2025
作者: Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao, Yichi Zhou, Thomas Hanwen Zhu
cs.AI

摘要

大型语言模型(LLMs)通过结合长链思维强化学习,已展现出强大的数学推理能力。然而,由于仅依赖自然语言时缺乏明确的监督信号,它们在定理证明方面仍面临挑战。专用领域特定语言如Lean通过形式化验证提供清晰的监督,使得通过强化学习进行有效训练成为可能。在本研究中,我们提出了Seed-Prover,一种基于引理的全证明推理模型。Seed-Prover能够根据Lean的反馈、已证明的引理及自我总结,迭代优化其证明过程。为解决国际数学奥林匹克(IMO)级别的竞赛问题,我们设计了三种测试时推理策略,支持深度与广度兼备的推理。Seed-Prover成功证明了78.1%的形式化历史IMO问题,在MiniF2F上达到饱和,并在PutnamBench上取得超过50%的成绩,大幅超越了之前的最先进水平。针对Lean在几何支持上的不足,我们引入了几何推理引擎Seed-Geometry,其表现优于以往的形式化几何引擎。我们利用这两个系统参与了2025年IMO,并完整证明了六道题目中的五道。此研究标志着自动化数学推理领域的重大进展,展示了长链思维推理与形式化验证相结合的有效性。
English
LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose Seed-Prover, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.
PDF722August 1, 2025