コンパイラのフィードバックからLean証明の修復を学習する

要旨

ニューラル定理証明器が自律性を増すにつれ、コンパイラフィードバックを解釈し行動する能力が重要となっている。しかし、既存のLeanデータセットはほぼ完全に正しい証明で構成されており、失敗の理解と修正に対する指導がほとんどない。本研究では、Leanの証明修正を教師あり学習問題として扱う：誤った証明とコンパイラフィードバックが与えられた時、修正された証明と、同じフィードバックに基づいた自然言語による診断の両方を予測する。我々はAPRIL（Automated Proof Repair in Lean）を導入する。これは体系的に生成された証明失敗とコンパイラ診断、整合性のある修正目標と説明目標をペアにした26万組の教師付きタプルからなるデータセットである。APRILで言語モデルを訓練すると、修正精度とフィードバック条件付き推論が大幅に改善され、単発修正評価設定において、ファインチューニングされた40億パラメータモデルが最強のオープンソースベースラインを上回った。診断条件付き教師信号は、フィードバックを利用する証明器に対する相補的な訓練信号と捉えられる。データセットはhttps://huggingface.co/datasets/uw-math-ai/APRIL{this link}で公開されている。

English

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at https://huggingface.co/datasets/uw-math-ai/APRIL{this link}.

コンパイラのフィードバックからLean証明の修復を学習する

Learning to Repair Lean Proofs from Compiler Feedback

要旨

Support