Mathesis：邁向從自然語言進行形式化定理證明

摘要

近期大型語言模型的進展展現了在形式推理方面的巨大潛力。然而，大多數基於LLM的定理證明器長期以來受限於需要專家撰寫的形式化陳述作為輸入，這限制了它們在處理以自然語言表達的現實世界問題時的適用性。我們通過Mathesis來解決這一差距，這是第一個處理非正式問題陳述的端到端定理證明流程。它貢獻了Mathesis-Autoformalizer，這是首個利用強化學習來增強自然語言問題形式化能力的自動形式化工具，並輔以我們新穎的LeanScorer框架進行細緻的形式化質量評估。此外，它還提出了Mathesis-Prover，用於從形式化陳述生成形式化證明。為了評估端到端形式定理證明的實際應用性，我們引入了Gaokao-Formal，這是一個包含488道來自中國高考的複雜問題的基準測試集。我們的方法經過精心設計，對每個組件進行了深入研究。實驗證明了Mathesis的有效性，其中自動形式化工具在Gaokao-Formal上的通過率比最佳基線高出22%。整個系統超越了其他模型組合，在MiniF2F上以pass@32達到了64%的準確率，並在Gaokao-Formal上取得了18%的領先水平。

English

Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.

Mathesis：邁向從自然語言進行形式化定理證明

Mathesis: Towards Formal Theorem Proving from Natural Languages

摘要

Support