哥德爾證明者:開源自動定理證明的前沿模型
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
February 11, 2025
作者: Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
cs.AI
摘要
我們介紹 Goedel-Prover,一個開源的大型語言模型(LLM),在數學問題的自動形式證明生成方面實現了最先進的性能。這個領域的主要挑戰在於正式化數學陳述和證明的稀缺性,我們通過以下方式應對。我們訓練陳述正式化器將 Numina 的自然語言數學問題轉換為正式語言(Lean 4),創建了一個包含 1.64 百萬個正式陳述的數據集。LLM 用於檢查正式陳述是否準確地保留了原始自然語言問題的內容。然後,我們通過訓練一系列證明器來迭代地構建一個大型的正式證明數據集。每個證明器成功地證明了許多之前的證明器無法證明的陳述,這些新的證明被添加到下一個證明器的訓練集中。最終的證明器在整個證明生成方面優於所有現有的開源模型。在 miniF2F 基準測試中,它實現了 57.6% 的成功率(Pass@32),超過之前最好的開源模型 7.6%。在 PutnamBench 上,Goedel-Prover 成功解決了 7 個問題(Pass@512),在排行榜上名列第一。此外,它為 Lean Workbook 問題生成了 29.7K 個正式證明,幾乎是之前作品產生的 15.7K 的兩倍。
English
We introduce Goedel-Prover, an open-source large language model (LLM) that
achieves the state-of-the-art (SOTA) performance in automated formal proof
generation for mathematical problems. The key challenge in this field is the
scarcity of formalized math statements and proofs, which we tackle in the
following ways. We train statement formalizers to translate the natural
language math problems from Numina into formal language (Lean 4), creating a
dataset of 1.64 million formal statements. LLMs are used to check that the
formal statements accurately preserve the content of the original natural
language problems. We then iteratively build a large dataset of formal proofs
by training a series of provers. Each prover succeeds in proving many
statements that the previous ones could not, and these new proofs are added to
the training set for the next prover. The final prover outperforms all existing
open-source models in whole-proof generation. On the miniF2F benchmark, it
achieves a 57.6% success rate (Pass@32), exceeding the previous best
open-source model by 7.6%. On PutnamBench, Goedel-Prover successfully solves 7
problems (Pass@512), ranking first on the leaderboard. Furthermore, it
generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the
15.7K produced by earlier works.Summary
AI-Generated Summary