Goedel-Prover-V2: 스캐폴드 데이터 합성 및 자기 수정을 통한 형식적 정리 증명의 확장

초록

Goedel-Prover-V2 시리즈는 자동 정리 증명 분야에서 새로운 최첨단 기술을 제시하는 오픈소스 언어 모델입니다. 표준 전문가 반복 및 강화 학습 파이프라인을 기반으로 구축된 이 접근법은 세 가지 주요 혁신을 통합합니다: (1) 스캐폴딩 데이터 합성: 점점 더 복잡한 정리를 마스터할 수 있도록 모델을 훈련시키기 위해 난이도가 점진적으로 증가하는 합성 작업을 생성합니다; (2) 검증기 기반 자기 수정: Lean 컴파일러의 피드백을 활용하여 모델이 증명을 반복적으로 수정할 수 있도록 합니다; (3) 모델 평균화: 훈련 후반부에 모델 출력 다양성이 감소하는 문제를 완화하기 위해 모델 체크포인트를 병합합니다. 우리의 소형 모델인 Goedel-Prover-V2-8B는 MiniF2F에서 84.6%의 pass@32를 달성하며, 동일한 지표에서 DeepSeek-Prover-V2-671B를 능가합니다. 이는 모델 크기가 80배 더 작음에도 불구하고 이루어진 성과입니다. 우리의 주력 모델인 Goedel-Prover-V2-32B는 MiniF2F에서 표준 모드에서 88.1%, 자기 수정 모드에서 90.4%의 pass@32를 달성하여 기존 최고 기록을 크게 앞질렀습니다. 또한, 이 주력 모델은 PutnamBench에서 pass@184 기준으로 86개의 문제를 해결하여 오픈소스 모델 리더보드에서 1위를 차지했습니다. 이는 DeepSeek-Prover-V2-671B가 pass@1024 기준으로 47개의 문제를 해결한 기록을 훨씬 더 작은 모델 크기와 컴퓨팅 예산으로 능가한 것입니다. Goedel-Prover-V2는 출시 시점(2025년 7월~8월)에 모든 오픈소스 정리 증명기 중에서 가장 강력한 전반적인 성능을 보여주었습니다. 또한, 제한된 테스트 시간 컴퓨팅 예산 하에서 공개적으로 보고된 성능을 가진 클로즈드소스 시스템을 포함한 최고 성능 모델들 중 하나로 자리 잡았습니다. 우리의 모델, 코드, 데이터는 https://github.com/Goedel-LM/Goedel-Prover-V2에서 공개됩니다.

English

We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.

Goedel-Prover-V2: 스캐폴드 데이터 합성 및 자기 수정을 통한 형식적 정리 증명의 확장

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

초록

Support