ChatPaper.aiChatPaper

GAR:面向形式化定理证明的生成对抗强化学习

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

October 13, 2025
作者: Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
cs.AI

摘要

通过Lean等可验证语言解决数学问题,已对数学和计算机科学领域产生了深远影响。当前最先进的模型通常依赖于昂贵的在线强化学习(RL)或专家迭代进行训练。然而,这些方法依赖于固定的问题集,导致训练效率低下,并限制了模型处理复杂问题的能力。为克服这些局限,我们提出了GAR:生成对抗强化学习,这是一个全面的RL训练框架,它在一个对抗循环中联合训练问题生成器与求解器。GAR引入了一种隐式的课程学习机制,使任务难度与证明者能力的发展相匹配,从而提升了训练效率,并增强了证明高级定理的能力。实验表明,经过GAR训练后,Goedel-Prover-V2-8B和DeepSeek-Prover-V2-7B在MiniF2F-Test基准测试中的pass@32平均相对提升了4.20%,而DeepSeek-Prover-V2在ProofNet-Test上的pass@32从22.58%提高至25.81%。除了形式化证明,GAR还为可验证环境下问题生成与求解的协同进化确立了一种通用的RL范式。
English
Solving math problems through verifiable languages such as Lean has significantly impacted both the mathematics and computer science communities. Current state-of-the-art models are often trained with expensive online Reinforcement Learning (RL) or expert iteration. However, these approaches rely on fixed problem sets, which causes inefficient training and limits the model to tackle complex problems. To overcome these limitations, we propose GAR: Generative Adversarial Reinforcement learning, a comprehensive RL training framework that jointly trains the problem composer and solver in an adversarial loop. GAR introduces an implicit curriculum learning mechanism, which aligns task difficulty with the prover's evolving capability. It thereby improves the training efficiency and enables stronger performance of proving advanced theorems. Experiments show that with GAR training, Goedel-Prover-V2-8B and DeepSeek-Prover-V2-7B achieve an average relative improvement in pass@32 of 4.20% on MiniF2F-Test benchmark, while DeepSeek-Prover-V2's pass@32 on ProofNet-Test increases from 22.58% to 25.81%. Beyond formal proving, GAR establishes a general RL paradigm for co-evolution of problem generation and solving under verifiable environments.
PDF252October 15, 2025