CriticLean：基於批評者指導的強化學習用於數學形式化

摘要

將自然語言數學陳述轉譯為可執行的正式程式碼，是自動化定理證明中的一項基本挑戰。儘管先前的研究主要集中在生成與編譯的成功率上，但對於批評階段——即評估生成的形式化是否真正捕捉到原問題的語意意圖——卻鮮少關注。本文中，我們介紹了CriticLean，一個新穎的批評引導強化學習框架，將批評的角色從被動驗證者提升為主動學習組件。具體而言，首先，我們提出了CriticLeanGPT，通過監督式微調與強化學習進行訓練，以嚴格評估Lean 4形式化的語意忠實度。接著，我們引入了CriticLeanBench，一個旨在衡量模型區分語意正確與錯誤形式化能力的基準，並展示了我們訓練的CriticLeanGPT模型能夠顯著超越強大的開源與閉源基線。基於CriticLean框架，我們構建了FineLeanCorpus，一個包含超過285K問題的數據集，展現了豐富的領域多樣性、廣泛的難度覆蓋範圍以及基於人類評估的高正確性。總體而言，我們的研究結果強調了優化批評階段對於產生可靠形式化的重要性，並希望我們的CriticLean能為未來形式數學推理的進展提供寶貴的洞見。

English

Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 285K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation. Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations, and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.

CriticLean：基於批評者指導的強化學習用於數學形式化

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

摘要

Support