GAR:面向形式定理证明的生成对抗强化学习
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
October 13, 2025
作者: Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
cs.AI
摘要
通过可验证语言如Lean解决数学问题,已对数学与计算机科学界产生了深远影响。当前最先进的模型通常采用昂贵的在线强化学习(RL)或专家迭代进行训练。然而,这些方法依赖于固定的问题集,导致训练效率低下,并限制了模型处理复杂问题的能力。为克服这些局限,我们提出了GAR:生成对抗性强化学习,一个全面的RL训练框架,它在一个对抗循环中联合训练问题构建者与解决者。GAR引入了一种隐式课程学习机制,使任务难度与证明者能力的演进相匹配,从而提高了训练效率,并增强了证明高级定理的能力。实验表明,经过GAR训练,Goedel-Prover-V2-8B和DeepSeek-Prover-V2-7B在MiniF2F-Test基准上的pass@32平均相对提升了4.20%,而DeepSeek-Prover-V2在ProofNet-Test上的pass@32从22.58%提升至25.81%。除了形式化证明之外,GAR还建立了一个通用的RL范式,用于在可验证环境下实现问题生成与解决的协同进化。
English
Solving math problems through verifiable languages such as Lean has
significantly impacted both the mathematics and computer science communities.
Current state-of-the-art models are often trained with expensive online
Reinforcement Learning (RL) or expert iteration. However, these approaches rely
on fixed problem sets, which causes inefficient training and limits the model
to tackle complex problems. To overcome these limitations, we propose GAR:
Generative Adversarial Reinforcement learning, a comprehensive RL training
framework that jointly trains the problem composer and solver in an adversarial
loop. GAR introduces an implicit curriculum learning mechanism, which aligns
task difficulty with the prover's evolving capability. It thereby improves the
training efficiency and enables stronger performance of proving advanced
theorems. Experiments show that with GAR training, Goedel-Prover-V2-8B and
DeepSeek-Prover-V2-7B achieve an average relative improvement in pass@32 of
4.20% on MiniF2F-Test benchmark, while DeepSeek-Prover-V2's pass@32 on
ProofNet-Test increases from 22.58% to 25.81%. Beyond formal proving, GAR
establishes a general RL paradigm for co-evolution of problem generation and
solving under verifiable environments.