컴파일러 피드백을 통한 Lean 증명 수정 학습

초록

신경망 정리 증명기가 점점 더 에이전트 성격을 띠게 됨에 따라 컴파일러 피드백을 해석하고 이에 따라 행동하는 능력이 중요해지고 있습니다. 그러나 기존 Lean 데이터셋은 거의 전적으로 올바른 증명으로만 구성되어 있어 실패를 이해하고 수정하는 데 필요한 지도(supervision)가 거의 제공되지 않습니다. 본 연구는 Lean 증명 수리를 지도 학습 문제로 접근합니다: 오류가 있는 증명과 컴파일러 피드백이 주어졌을 때, 수정된 증명과 동일한 피드백에 기반한 자연어 진단 결과를 동시에 예측하는 것입니다. 우리는 체계적으로 생성된 증명 실패를 컴파일러 진단 메시지 및 이에 정렬된 수리 대상과 설명 대상과 짝 지은 26만 개의 지도 튜플로 구성된 APRIL(Automated Proof Repair in Lean) 데이터셋을 소개합니다. APRIL을 사용하여 언어 모델을 학습시키면 수리 정확도와 피드백 조건부 추론 능력이 크게 향상됩니다. 우리의 단일 시도(single-shot) 증명 수리 평가 환경에서 미세 조정된 40억 매개변수 모델은 가장 강력한 오픈소스 기준 모델을 능가하는 성능을 보였습니다. 우리는 진단 조건부 지도 학습이 피드백을 활용하는 증명기에게 보완적인 훈련 신호로 작용한다고 봅니다. 우리의 데이터셋은 https://huggingface.co/datasets/uw-math-ai/APRIL{해당 링크}에서 이용 가능합니다.

English

As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at https://huggingface.co/datasets/uw-math-ai/APRIL{this link}.

컴파일러 피드백을 통한 Lean 증명 수정 학습

Learning to Repair Lean Proofs from Compiler Feedback

초록

Support