从编译器反馈中学习修复Lean证明

摘要

随着神经定理证明器日益具备自主性，解析并响应编译器反馈的能力变得至关重要。然而现有的Lean数据集几乎完全由正确证明构成，对理解与修复失败案例的监督指导极为有限。我们将Lean证明修复作为监督学习问题进行研究：给定错误证明与编译器反馈，预测修正后的证明以及基于相同反馈的自然语言诊断。我们提出APRIL（Lean环境下的自动化证明修复）数据集，包含26万个监督元组，系统性地将生成的证明失败案例与编译器诊断信息、对齐的修复目标及解释目标进行配对。在APRIL上训练语言模型可显著提升修复准确率与反馈条件推理能力：在单次修复评估设定中，经微调的40亿参数模型表现优于最强的开源基线。我们认为诊断条件监督是为使用反馈的证明器提供补充训练信号的有效方式。数据集发布于https://huggingface.co/datasets/uw-math-ai/APRIL{此链接}。

English

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at https://huggingface.co/datasets/uw-math-ai/APRIL{this link}.

从编译器反馈中学习修复Lean证明

Learning to Repair Lean Proofs from Compiler Feedback

摘要

Support