Goedel-Prover-V2：通过支架式数据合成与自我校正扩展形式定理证明

摘要

我们推出了Goedel-Prover-V2，这是一系列在自动定理证明领域树立新标杆的开源语言模型。基于标准的专家迭代与强化学习流程，我们的方法融合了三大创新点：（1）阶梯式数据合成：通过生成难度递增的合成任务，训练模型逐步掌握更为复杂的定理；（2）验证器引导的自我修正：借助Lean编译器的反馈，使模型能够迭代修正其证明；（3）模型平均：合并模型检查点，以缓解训练后期模型输出多样性的下降。我们的轻量级模型Goedel-Prover-V2-8B，在MiniF2F测试中达到了84.6%的pass@32成绩，尽管体积仅为DeepSeek-Prover-V2-671B的1/80，却在该指标上超越了后者。旗舰模型Goedel-Prover-V2-32B，在标准模式下于MiniF2F测试中取得88.1%的pass@32成绩，在自我修正模式下更是达到90.4%，大幅领先于先前的最优水平。此外，该旗舰模型在PutnamBench上以pass@184解决了86道题目，在开源模型排行榜上位居首位，超越了DeepSeek-Prover-V2-671B以pass@1024解决47道题目的记录，且模型规模与计算预算显著更小。截至发布之时（2025年7月至8月），Goedel-Prover-V2在所有开源定理证明器中展现了最强的综合性能。在受限的测试计算预算下，它亦跻身于包括公开报告性能的闭源系统在内的顶尖模型之列。我们的模型、代码及数据已发布于https://github.com/Goedel-LM/Goedel-Prover-V2。

English

We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.

Goedel-Prover-V2：通过支架式数据合成与自我校正扩展形式定理证明

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

摘要

Support