OProver: 一个用于智能体化形式定理证明的统一框架

摘要

近年来，形式定理证明的进展得益于大规模证明生成和验证器感知训练，但智能体推理很少融入证明器的训练过程，仅出现于推理阶段。我们提出 OProver——一个面向 Lean 4 的智能体形式定理证明统一框架，在该框架中，失败的证明尝试会通过检索编译器验证的证明和 Lean 编译器反馈进行迭代修正。OProver 通过持续预训练及迭代后训练进行优化：每次迭代运行智能体推理，将新验证的证明索引至 OProofs 与检索记忆库，利用修复轨迹作为监督微调数据，并使用未解决的困难案例进行强化学习。OProofs 基于公开的 Lean 资源、大规模证明合成及智能体推理轨迹构建，包含 177 万条 Lean 语句、686 万个编译器验证的证明，以及序列化的轨迹（含检索上下文、失败尝试、反馈与修复）。在五个基准测试中，OProver-32B 在 MiniF2F（93.3%）、ProverBench（58.2%）和 PutnamBench（11.3%）上取得最佳 Pass@32 成绩，在 MathOlympiad（22.8%）和 ProofNet（33.2%）上排名第二，其顶级排名数量超过所有此前开源的整段证明器。

English

Recent progress in formal theorem proving has benefited from large-scale proof generation and verifier-aware training, but agentic proving is rarely integrated into prover training, appearing only at inference time. We present OProver, a unified framework for agentic formal theorem proving in Lean 4, in which failed proof attempts are iteratively revised using retrieved compiler verified proofs and Lean compiler feedback. OProver is trained through continued pretraining followed by iterative post-training: each iteration runs agentic proving, indexes newly verified proofs into OProofs and the retrieval memory, uses repair trajectories as SFT data, and uses unresolved hard cases for RL. OProofs is built from public Lean resources, large-scale proof synthesis, and agentic proving traces, containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized trajectories with retrieved context, failed attempts, feedback, and repairs. Across five benchmarks, OProver-32B attains the best Pass@32 on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad (22.8%) and ProofNet (33.2%) more top placements than any prior open-weight whole-proof prover.