LeanDojo: 検索拡張言語モデルを用いた定理証明

要旨

大規模言語モデル（LLM）は、Leanなどの証明アシスタントを使用して形式的な定理を証明する際に有望な成果を示しています。しかし、既存の手法は、非公開のコードやデータ、そして大規模な計算リソースを必要とするため、再現や拡張が困難です。これにより、定理証明のための機械学習手法に関する研究に大きな障壁が生じています。本論文では、これらの障壁を取り除くために、LeanDojoを紹介します。LeanDojoは、ツールキット、データ、モデル、ベンチマークからなるオープンソースのLeanプレイグラウンドです。LeanDojoはLeanからデータを抽出し、プログラム的に証明環境と対話することを可能にします。証明内の前提条件の細かいアノテーションを含んでおり、定理証明の主要なボトルネックである前提選択のための貴重なデータを提供します。このデータを使用して、我々はReProver（Retrieval-Augmented Prover）を開発しました。ReProverは、広大な数学ライブラリから前提を選択するために検索機能を強化した初めてのLLMベースの証明器です。これは低コストであり、わずか1週間のGPUトレーニングのみを必要とします。我々の検索器は、LeanDojoのプログラム解析能力を活用してアクセス可能な前提と困難なネガティブ例を特定し、検索を大幅に効果的にします。さらに、Leanの数学ライブラリから抽出された96,962の定理と証明からなる新しいベンチマークを構築しました。これは、トレーニング中に使用されなかった新しい前提に依存する定理に一般化することを証明器に要求する挑戦的なデータ分割を特徴としています。我々はこのベンチマークをトレーニングと評価に使用し、実験結果はReProverが非検索ベースラインやGPT-4を上回る有効性を示しています。これにより、我々はプロプライエタリなデータセットを一切使用しない初めてのオープンソースLLMベースの定理証明器セットを提供し、さらなる研究を促進するために寛容なMITライセンスの下で公開します。

English

Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.

LeanDojo: 検索拡張言語モデルを用いた定理証明

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

要旨

Support