OProver：一個用於智能體形式定理證明的統一框架

摘要

近期形式化定理证明的进展得益于大规模证明生成和验证器感知训练，但智能体证明很少被整合进证明器训练中，仅在推理时出现。我们提出 OProver，一个用于 Lean 4 中智能体形式化定理证明的统一框架，在该框架中，失败的证明尝试通过检索编译器验证的证明和 Lean 编译器反馈进行迭代修正。OProver 通过持续预训练及后续的迭代后训练进行训练：每次迭代执行智能体证明，将新验证的证明索引到 OProofs 和检索记忆中，将修复轨迹用作 SFT 数据，并将未解决的困难实例用于强化学习。OProofs 基于公开的 Lean 资源、大规模证明合成以及智能体证明轨迹构建，包含 177 万条 Lean 语句、686 万个编译器验证的证明，以及带有检索上下文、失败尝试、反馈和修复的序列化轨迹。在五个基准测试中，OProver-32B 在 MiniF2F（93.3%）、ProverBench（58.2%）和 PutnamBench（11.3%）上取得了最佳 Pass@32 成绩，在 MathOlympiad（22.8%）和 ProofNet（33.2%）上排名第二，其顶级排名数量超过之前任何开源权重的全证明证明器。

English

Recent progress in formal theorem proving has benefited from large-scale proof generation and verifier-aware training, but agentic proving is rarely integrated into prover training, appearing only at inference time. We present OProver, a unified framework for agentic formal theorem proving in Lean 4, in which failed proof attempts are iteratively revised using retrieved compiler verified proofs and Lean compiler feedback. OProver is trained through continued pretraining followed by iterative post-training: each iteration runs agentic proving, indexes newly verified proofs into OProofs and the retrieval memory, uses repair trajectories as SFT data, and uses unresolved hard cases for RL. OProofs is built from public Lean resources, large-scale proof synthesis, and agentic proving traces, containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized trajectories with retrieved context, failed attempts, feedback, and repairs. Across five benchmarks, OProver-32B attains the best Pass@32 on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad (22.8%) and ProofNet (33.2%) more top placements than any prior open-weight whole-proof prover.