炼金术:通过符号突变增强定理证明能力
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
October 21, 2024
作者: Shaonan Wu, Shuai Lu, Yeyun Gong, Nan Duan, Ping Wei
cs.AI
摘要
即使对经验丰富的专家来说,编写形式化证明也是具有挑战性的。最近神经定理证明(NTP)领域的进展显示出加快这一过程的潜力。然而,互联网上可用的形式化语料库相对于一般文本来说是有限的,这给NTP带来了重要的数据稀缺挑战。为了解决这一问题,本研究提出了Alchemy,一个通用的数据合成框架,通过符号变异构建形式化定理。具体而言,对于Mathlib中的每个候选定理,我们确定所有可调用的定理,这些定理可以用于重写或应用于它。随后,我们通过用等价形式或前提替换陈述中的相应术语来变异候选定理。因此,我们的方法将Mathlib中的定理数量增加了一个数量级,从110k增加到6M。此外,我们对这个增广语料库进行持续的预训练和监督微调,用于大型语言模型。实验结果表明了我们方法的有效性,在Leandojo基准测试中实现了5%的绝对性能提升。此外,我们的合成数据在超出分布的miniF2F基准测试中获得了2.5%的绝对性能提升。为了提供更多见解,我们对合成数据的构成和训练范式进行了全面分析,为开发强大的定理证明器提供了有价值的指导。
English
Formal proofs are challenging to write even for experienced experts. Recent
progress in Neural Theorem Proving (NTP) shows promise in expediting this
process. However, the formal corpora available on the Internet are limited
compared to the general text, posing a significant data scarcity challenge for
NTP. To address this issue, this work proposes Alchemy, a general framework for
data synthesis that constructs formal theorems through symbolic mutation.
Specifically, for each candidate theorem in Mathlib, we identify all invocable
theorems that can be used to rewrite or apply to it. Subsequently, we mutate
the candidate theorem by replacing the corresponding term in the statement with
its equivalent form or antecedent. As a result, our method increases the number
of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore,
we perform continual pretraining and supervised finetuning on this augmented
corpus for large language models. Experimental results demonstrate the
effectiveness of our approach, achieving a 5% absolute performance improvement
on Leandojo benchmark. Additionally, our synthetic data achieve a 2.5% absolute
performance gain on the out-of-distribution miniF2F benchmark. To provide
further insights, we conduct a comprehensive analysis of synthetic data
composition and the training paradigm, offering valuable guidance for
developing a strong theorem prover.Summary
AI-Generated Summary