ChatPaper.aiChatPaper

迈向通过解耦推理与证明解决更具挑战性的国际数学奥林匹克问题

Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

July 7, 2025
作者: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu
cs.AI

摘要

在形式语言中的自动定理证明(ATP)是人工智能领域的一项基础性挑战。尽管大型语言模型(LLMs)已推动显著进展,但其强大的非形式推理能力与薄弱的正式证明性能之间仍存在显著差距。近期研究表明,在如PutnamBench等基准测试中,非形式准确率超过80%,而正式成功率却低于8%。我们认为,这一差距之所以持续存在,是因为当前最先进的证明器通过紧密耦合推理与证明,采用了无意中惩罚深度推理、偏向于浅层战术策略的训练范式。为弥合这一根本性差距,我们提出了一种新颖框架,将高层推理与低层证明生成解耦。我们的方法利用两个独立且专门化的模型:一个强大的通用推理器,用于生成多样化的战略性子目标引理;以及一个高效的证明器,用于严格验证这些引理。这种模块化设计释放了模型的全部推理潜力,并规避了端到端训练的陷阱。我们在2000年后国际数学奥林匹克(IMO)难题集上评估了该方法,这是一个此前未有开源证明器报告成功的题目集。我们的解耦框架成功解决了其中5道问题,展示了在应对极其复杂的数学挑战方面自动推理的重大进展。为促进未来研究,我们发布了针对广泛IMO问题生成并验证的引理完整数据集,访问地址为https://tencent-imo.github.io/。
English
Automated Theorem Proving (ATP) in formal languages is a foundational challenge for AI. While Large Language Models (LLMs) have driven remarkable progress, a significant gap remains between their powerful informal reasoning capabilities and their weak formal proving performance. Recent studies show that the informal accuracy exceeds 80% while formal success remains below 8% on benchmarks like PutnamBench. We argue this gap persists because current state-of-the-art provers, by tightly coupling reasoning and proving, are trained with paradigms that inadvertently punish deep reasoning in favor of shallow, tactic-based strategies. To bridge this fundamental gap, we propose a novel framework that decouples high-level reasoning from low-level proof generation. Our approach utilizes two distinct, specialized models: a powerful, general-purpose Reasoner to generate diverse, strategic subgoal lemmas, and an efficient Prover to rigorously verify them. This modular design liberates the model's full reasoning potential and bypasses the pitfalls of end-to-end training. We evaluate our method on a challenging set of post-2000 IMO problems, a problem set on which no prior open-source prover has reported success. Our decoupled framework successfully solves 5 of these problems, demonstrating a significant step towards automated reasoning on exceptionally difficult mathematical challenges. To foster future research, we release our full dataset of generated and verified lemmas for a wide range of IMO problems, available at https://tencent-imo.github.io/ .
PDF131July 10, 2025