Seed-Prover：深度与广度兼备的自动定理证明推理系统

摘要

大型语言模型（LLMs）通过结合长链思维强化学习，已展现出强大的数学推理能力。然而，由于仅依赖自然语言时缺乏明确的监督信号，它们在定理证明方面仍面临挑战。专用领域特定语言如Lean通过形式化验证提供清晰的监督，使得通过强化学习进行有效训练成为可能。在本研究中，我们提出了Seed-Prover，一种基于引理的全证明推理模型。Seed-Prover能够根据Lean的反馈、已证明的引理及自我总结，迭代优化其证明过程。为解决国际数学奥林匹克（IMO）级别的竞赛问题，我们设计了三种测试时推理策略，支持深度与广度兼备的推理。Seed-Prover成功证明了78.1%的形式化历史IMO问题，在MiniF2F上达到饱和，并在PutnamBench上取得超过50%的成绩，大幅超越了之前的最先进水平。针对Lean在几何支持上的不足，我们引入了几何推理引擎Seed-Geometry，其表现优于以往的形式化几何引擎。我们利用这两个系统参与了2025年IMO，并完整证明了六道题目中的五道。此研究标志着自动化数学推理领域的重大进展，展示了长链思维推理与形式化验证相结合的有效性。

English

LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose Seed-Prover, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.