從編譯器回饋中學習修復 Lean 證明

摘要

隨著神經定理證明器日益具備自主性，解析並依據編譯器回饋採取行動的能力變得至關重要。然而現有的 Lean 資料集幾乎僅包含正確證明，對於理解與修復失敗案例的監督學習資源極為匱乏。我們將 Lean 證明修復作為監督學習問題進行研究：給定錯誤證明與編譯器回饋，預測修正後的證明以及基於相同回饋的自然語言診斷。我們提出 APRIL（Lean 中的自動化證明修復）資料集，包含 26 萬組監督學習元組，系統化地將生成的證明失敗案例與編譯器診斷資訊、對應的修復目標及解釋目標進行配對。在 APRIL 上訓練語言模型能顯著提升修復準確率與回饋條件化推理能力；在我們的單次修復評估設定中，經微調的 40 億參數模型表現優於最強的開源基準方法。我們將診斷條件化監督視為反饋驅動型證明器的補充訓練信號。本資料集公開於 https://huggingface.co/datasets/uw-math-ai/APRIL{此連結}。

English

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at https://huggingface.co/datasets/uw-math-ai/APRIL{this link}.

從編譯器回饋中學習修復 Lean 證明

Learning to Repair Lean Proofs from Compiler Feedback

摘要

Support