CriticLean:基于评论家指导的强化学习用于数学形式化
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
July 8, 2025
作者: Zhongyuan Peng, Yifan Yao, Kaijing Ma, Shuyue Guo, Yizhe Li, Yichi Zhang, Chenchen Zhang, Yifan Zhang, Zhouliang Yu, Luming Li, Minghao Liu, Yihang Xia, Jiawei Shen, Yuchen Wu, Yixin Cao, Zhaoxiang Zhang, Wenhao Huang, Jiaheng Liu, Ge Zhang
cs.AI
摘要
将自然语言数学陈述转化为正式、可执行的代码是自动定理证明中的一个基本挑战。尽管先前的研究主要集中在生成和编译的成功率上,但很少关注批评阶段——即评估生成的形式化是否真正捕捉到原问题的语义意图。本文中,我们引入了CriticLean,一种新颖的批评引导强化学习框架,将批评的角色从被动验证者提升为主动学习组件。具体而言,首先,我们提出了CriticLeanGPT,通过监督微调和强化学习进行训练,以严格评估Lean 4形式化的语义保真度。接着,我们引入了CriticLeanBench,一个旨在衡量模型区分语义正确与错误形式化能力的基准,并展示了我们训练的CriticLeanGPT模型能够显著超越强大的开源和闭源基线。基于CriticLean框架,我们构建了FineLeanCorpus,一个包含超过285K问题的数据集,该数据集展现了丰富的领域多样性、广泛的难度覆盖范围以及基于人类评估的高正确性。总体而言,我们的研究结果表明,优化批评阶段对于生成可靠的形式化至关重要,我们希望CriticLean能为未来形式数学推理的进展提供有价值的见解。
English
Translating natural language mathematical statements into formal, executable
code is a fundamental challenge in automated theorem proving. While prior work
has focused on generation and compilation success, little attention has been
paid to the critic phase-the evaluation of whether generated formalizations
truly capture the semantic intent of the original problem. In this paper, we
introduce CriticLean, a novel critic-guided reinforcement learning framework
that elevates the role of the critic from a passive validator to an active
learning component. Specifically, first, we propose the CriticLeanGPT, trained
via supervised fine-tuning and reinforcement learning, to rigorously assess the
semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench,
a benchmark designed to measure models' ability to distinguish semantically
correct from incorrect formalizations, and demonstrate that our trained
CriticLeanGPT models can significantly outperform strong open- and
closed-source baselines. Building on the CriticLean framework, we construct
FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich
domain diversity, broad difficulty coverage, and high correctness based on
human evaluation. Overall, our findings highlight that optimizing the critic
phase is essential for producing reliable formalizations, and we hope our
CriticLean will provide valuable insights for future advances in formal
mathematical reasoning.