ChatPaper.aiChatPaper

CriticLean:基于评论家指导的强化学习用于数学形式化

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

July 8, 2025
作者: Zhongyuan Peng, Yifan Yao, Kaijing Ma, Shuyue Guo, Yizhe Li, Yichi Zhang, Chenchen Zhang, Yifan Zhang, Zhouliang Yu, Luming Li, Minghao Liu, Yihang Xia, Jiawei Shen, Yuchen Wu, Yixin Cao, Zhaoxiang Zhang, Wenhao Huang, Jiaheng Liu, Ge Zhang
cs.AI

摘要

将自然语言数学陈述转化为正式、可执行的代码是自动定理证明中的一个基本挑战。尽管先前的研究主要集中在生成和编译的成功率上,但很少关注批评阶段——即评估生成的形式化是否真正捕捉到原问题的语义意图。本文中,我们引入了CriticLean,一种新颖的批评引导强化学习框架,将批评的角色从被动验证者提升为主动学习组件。具体而言,首先,我们提出了CriticLeanGPT,通过监督微调和强化学习进行训练,以严格评估Lean 4形式化的语义保真度。接着,我们引入了CriticLeanBench,一个旨在衡量模型区分语义正确与错误形式化能力的基准,并展示了我们训练的CriticLeanGPT模型能够显著超越强大的开源和闭源基线。基于CriticLean框架,我们构建了FineLeanCorpus,一个包含超过285K问题的数据集,该数据集展现了丰富的领域多样性、广泛的难度覆盖范围以及基于人类评估的高正确性。总体而言,我们的研究结果表明,优化批评阶段对于生成可靠的形式化至关重要,我们希望CriticLean能为未来形式数学推理的进展提供有价值的见解。
English
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations, and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
PDF381July 9, 2025