ChatPaper.aiChatPaper

Seed-Prover 1.5:通过经验学习掌握本科水平定理证明

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

December 19, 2025
作者: Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, Thomas Hanwen Zhu
cs.AI

摘要

近期,大型语言模型在生成严谨数学证明方面取得显著进展。然而,利用LLM进行形式化语言(如Lean)的定理证明仍具挑战性且计算成本高昂,尤其在处理本科及以上难度的问题时。本研究提出Seed-Prover 1.5——一个通过大规模智能体强化学习训练的形式定理证明模型,并配套高效测试时扩展工作流。该模型通过与Lean等工具的持续交互,在强化学习过程中不断积累经验,显著提升了形式化定理证明的能力与效率。此外,基于自然语言证明的最新进展,我们的测试时扩展工作流能有效弥合自然语言与形式化语言之间的鸿沟。相比现有最优方法,Seed-Prover 1.5以更小的计算预算实现了更优异的性能:在本科难度的PutnamBench中解决88%问题,研究生难度的Fate-H中解决80%问题,博士难度的Fate-X中解决33%问题。值得关注的是,该系统在9小时内完成了2025年普特南数学竞赛12道题目中的11道。我们的研究表明,基于高质量形式化反馈的经验学习规模化拓展,将为形式化数学推理的未来发展开辟巨大潜力。
English
Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present Seed-Prover 1.5, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves 88\% of PutnamBench (undergraduate-level), 80\% of Fate-H (graduate-level), and 33\% of Fate-X (PhD-level) problems. Notably, using our system, we solved 11 out of 12 problems from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.
PDF371December 23, 2025