ChatPaper.aiChatPaper

LeanDojo:搭載檢索增強語言模型的定理證明

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

June 27, 2023
作者: Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar
cs.AI

摘要

大型語言模型(LLMs)已顯示出在使用證明助手(如Lean)證明形式定理方面具有潛力。然而,由於私有代碼、數據和大量計算需求,現有方法難以復制或擴展。這給定理證明機器學習方法的研究帶來了重大障礙。本文通過引入LeanDojo來消除這些障礙:這是一個開源的Lean遊樂場,包括工具包、數據、模型和基準測試。LeanDojo從Lean中提取數據,並使得可以以程序方式與證明環境進行交互。它包含對證明中前提的精細注釋,為前提選擇提供了寶貴的數據:這是定理證明中的一個關鍵瓶頸。利用這些數據,我們開發了ReProver(檢索增強證明器):這是第一個以LLM為基礎的證明器,其搭配檢索以從龐大的數學庫中選擇前提。它成本低廉,僅需一個GPU週的訓練。我們的檢索器利用LeanDojo的程序分析功能來識別可訪問的前提和困難的負例,這使得檢索更加有效。此外,我們構建了一個新的基準測試,包括從Lean的數學庫中提取的96962個定理和證明。它具有具有挑戰性的數據分割,要求證明器對依賴於訓練中從未使用過的新前提的定理進行泛化。我們使用這個基準測試進行訓練和評估,實驗結果顯示ReProver相對於非檢索基線和GPT-4的效果。因此,我們提供了第一組無專有數據集的開源LLM基礎定理證明器,並根據寬鬆的MIT許可證發布,以促進進一步研究。
English
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
PDF170December 15, 2024