ChatPaper.aiChatPaper

LeanDojo:检索增强语言模型的定理证明

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

June 27, 2023
作者: Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar
cs.AI

摘要

大型语言模型(LLMs)已显示出在使用诸如Lean之类的证明助手证明形式定理方面具有潜力。然而,由于私有代码、数据和大量计算需求,现有方法难以复制或构建。这给定理证明机器学习方法的研究带来了实质性障碍。本文通过引入LeanDojo消除了这些障碍:这是一个开源的Lean游乐场,包括工具包、数据、模型和基准。LeanDojo从Lean中提取数据,并允许以编程方式与证明环境进行交互。它包含对证明中前提的细粒度注释,为前提选择提供了宝贵的数据:这是定理证明中的一个关键瓶颈。利用这些数据,我们开发了ReProver(检索增强证明器):这是第一个基于LLM的证明器,通过检索从庞大的数学库中选择前提。它成本低廉,只需一周的GPU训练时间。我们的检索器利用LeanDojo的程序分析能力来识别可访问的前提和困难的负例,从而使检索更加有效。此外,我们构建了一个新的基准,包括从Lean数学库中提取的96,962个定理和证明。它具有具有挑战性的数据拆分,要求证明器能够推广到依赖于在训练中从未使用过的新前提的定理。我们使用这个基准进行训练和评估,实验结果显示了ReProver相对于无检索基线和GPT-4的有效性。因此,我们提供了第一组无专有数据集的开源LLM定理证明器,并根据宽松的MIT许可发布,以促进进一步研究。
English
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
PDF170December 15, 2024