MPS-Prover:通过多视角搜索与数据精炼推进逐步定理证明
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
May 16, 2025
作者: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu
cs.AI
摘要
在形式语言中的自动定理证明(ATP)仍然是人工智能领域的一项艰巨挑战,它要求严格的逻辑推理并需在庞大的搜索空间中导航。尽管大型语言模型(LLMs)已展现出令人鼓舞的表现,现有的逐步证明器常因搜索引导的偏差而效率低下,导致证明策略不够优化。本文介绍了多视角搜索证明器(MPS-Prover),一种新颖的逐步ATP系统,旨在克服这些局限。MPS-Prover融合了两项关键创新:一是高效的训练后数据筛选策略,能在不牺牲性能的前提下削减约40%的冗余训练数据;二是多视角树搜索机制。该机制将学习到的评判模型与精心设计的启发式规则相结合,以多样化策略选择,避免陷入无效状态,并增强搜索的鲁棒性。广泛的评估表明,MPS-Prover在包括miniF2F和ProofNet在内的多个挑战性基准测试中达到了最先进的性能,超越了之前拥有70亿参数的模型。此外,我们的分析揭示,与现有的逐步及全证明方法相比,MPS-Prover生成的证明显著更短且更具多样性,凸显了其高效性与有效性。我们的研究推进了基于LLM的形式推理能力,并为开发更强大的定理证明器提供了一个稳健的框架和全面的分析。
English
Automated Theorem Proving (ATP) in formal languages remains a formidable
challenge in AI, demanding rigorous logical deduction and navigating vast
search spaces. While large language models (LLMs) have shown promising
performance, existing stepwise provers often suffer from biased search
guidance, leading to inefficiencies and suboptimal proof strategies. This paper
introduces the Multi-Perspective Search Prover (MPS-Prover), a novel stepwise
ATP system designed to overcome these limitations. MPS-Prover incorporates two
key innovations: a highly effective post-training data curation strategy that
prunes approximately 40% of redundant training data without sacrificing
performance, and a multi-perspective tree search mechanism. This search
integrates a learned critic model with strategically designed heuristic rules
to diversify tactic selection, prevent getting trapped in unproductive states,
and enhance search robustness. Extensive evaluations demonstrate that
MPS-Prover achieves state-of-the-art performance on multiple challenging
benchmarks, including miniF2F and ProofNet, outperforming prior 7B parameter
models. Furthermore, our analyses reveal that MPS-Prover generates
significantly shorter and more diverse proofs compared to existing stepwise and
whole-proof methods, highlighting its efficiency and efficacy. Our work
advances the capabilities of LLM-based formal reasoning and offers a robust
framework and a comprehensive analysis for developing more powerful theorem
provers.Summary
AI-Generated Summary