**OProver: エージェント指向形式的定理証明のための統一フレームワーク**

要旨

形式定理証明の分野における最近の進展は、大規模な証明生成や検証器を考慮した訓練によって恩恵を受けているが、エージェント的な証明手法がプローバーの訓練に組み込まれることは稀であり、推論時にのみ現れる。本稿では、Lean 4におけるエージェント的定理証明のための統一フレームワークであるOProverを提案する。OProverは、失敗した証明試行を、検索されたコンパイラ検証済み証明とLeanコンパイラのフィードバックを用いて反復的に修正する。OProverは、継続事前学習とそれに続く反復的事後訓練によって訓練される。各反復では、エージェント的証明を実行し、新たに検証された証明をOProofsと検索用メモリに索引付けし、修復軌跡をSFTデータとして使用し、未解決の難解事例を強化学習に用いる。OProofsは、公開されたLeanリソース、大規模な証明合成、およびエージェント的証明の軌跡から構築されており、177万のLean文、686万のコンパイラ検証済み証明、および検索コンテキスト、失敗試行、フィードバック、修復を含む系列化された軌跡を収録している。5つのベンチマークにおいて、OProver-32BはMiniF2F（93.3%）、ProverBench（58.2%）、PutnamBench（11.3%）で最高のPass@32を達成し、MathOlympiad（22.8%）とProofNet（33.2%）では2位となり、これまでのオープンウェイトの全証明プローバーよりも多くのトップ順位を獲得した。

English

Recent progress in formal theorem proving has benefited from large-scale proof generation and verifier-aware training, but agentic proving is rarely integrated into prover training, appearing only at inference time. We present OProver, a unified framework for agentic formal theorem proving in Lean 4, in which failed proof attempts are iteratively revised using retrieved compiler verified proofs and Lean compiler feedback. OProver is trained through continued pretraining followed by iterative post-training: each iteration runs agentic proving, indexes newly verified proofs into OProofs and the retrieval memory, uses repair trajectories as SFT data, and uses unresolved hard cases for RL. OProofs is built from public Lean resources, large-scale proof synthesis, and agentic proving traces, containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized trajectories with retrieved context, failed attempts, feedback, and repairs. Across five benchmarks, OProver-32B attains the best Pass@32 on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad (22.8%) and ProofNet (33.2%) more top placements than any prior open-weight whole-proof prover.