Seed-Prover：深度與廣度推理的自動化定理證明

摘要

大型語言模型（LLMs）通過利用長鏈思維的強化學習，展現了強大的數學推理能力，然而由於僅使用自然語言時缺乏明確的監督信號，它們在定理證明方面仍面臨挑戰。專用的領域特定語言，如Lean，通過形式化驗證證明提供了清晰的監督，從而實現了有效的強化學習訓練。在本研究中，我們提出了Seed-Prover，一種基於引理的全證明推理模型。Seed-Prover能夠根據Lean的反饋、已證明的引理以及自我總結，迭代地精煉其證明。為了解決國際數學奧林匹克（IMO）級別的競賽問題，我們設計了三種測試時推理策略，實現了深度與廣度兼備的推理。Seed-Prover證明了78.1%的形式化歷史IMO問題，在MiniF2F上達到飽和，並在PutnamBench上取得了超過50%的成績，大幅超越了先前的最先進水平。針對Lean中幾何支持的不足，我們引入了幾何推理引擎Seed-Geometry，其性能超越了先前的形式化幾何引擎。我們利用這兩個系統參與了2025年IMO，並完整證明了6道問題中的5道。這項工作代表了自動化數學推理的重大進展，展示了長鏈思維推理與形式化驗證的有效性。

English

LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose Seed-Prover, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

Seed-Prover：深度與廣度推理的自動化定理證明

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

摘要

Support