OProver: 에이전트 기반 형식 정리 증명을 위한 통합 프레임워크

초록

최근 형식 정리 증명 분야의 진전은 대규모 증명 생성과 검증기 인식 훈련의 이점을 활용해 왔지만, 에이전트 기반 증명은 추론 시에만 나타날 뿐 증명기 훈련에는 거의 통합되지 않았다. 본 논문에서는 Lean 4에서 에이전트 기반 형식 정리 증명을 위한 통합 프레임워크인 OProver를 제시한다. OProver는 실패한 증명 시도를 검색된 컴파일러 검증 증명과 Lean 컴파일러 피드백을 사용하여 반복적으로 수정한다. OProver는 지속적 사전 훈련 후 반복적 사후 훈련을 통해 훈련된다. 각 반복은 에이전트 기반 증명을 실행하고, 새로 검증된 증명을 OProofs와 검색 메모리에 색인하며, 수정 궤적을 SFT 데이터로 사용하고, 해결되지 않은 어려운 사례를 강화 학습에 사용한다. OProofs는 공개 Lean 리소스, 대규모 증명 합성, 에이전트 기반 증명 궤적으로 구축되었으며, 177만 개의 Lean 문장, 686만 개의 컴파일러 검증 증명, 검색 컨텍스트, 실패 시도, 피드백 및 수정이 포함된 직렬화된 궤적을 포함한다. 다섯 가지 벤치마크에서 OProver-32B는 MiniF2F(93.3%), ProverBench(58.2%), PutnamBench(11.3%)에서 최고의 Pass@32를 달성했으며, MathOlympiad(22.8%)와 ProofNet(33.2%)에서는 두 번째로 높은 순위를 기록하여 기존의 모든 공개 가중치 전체 증명 증명기보다 더 많은 최상위 순위를 차지했다.

English

Recent progress in formal theorem proving has benefited from large-scale proof generation and verifier-aware training, but agentic proving is rarely integrated into prover training, appearing only at inference time. We present OProver, a unified framework for agentic formal theorem proving in Lean 4, in which failed proof attempts are iteratively revised using retrieved compiler verified proofs and Lean compiler feedback. OProver is trained through continued pretraining followed by iterative post-training: each iteration runs agentic proving, indexes newly verified proofs into OProofs and the retrieval memory, uses repair trajectories as SFT data, and uses unresolved hard cases for RL. OProofs is built from public Lean resources, large-scale proof synthesis, and agentic proving traces, containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized trajectories with retrieved context, failed attempts, feedback, and repairs. Across five benchmarks, OProver-32B attains the best Pass@32 on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad (22.8%) and ProofNet (33.2%) more top placements than any prior open-weight whole-proof prover.