StepFun-Formalizer：通过知识推理融合释放大语言模型的自动形式化潜能

摘要

自动形式化旨在将自然语言表述的数学命题转化为形式化语言。尽管大语言模型（LLMs）推动了该领域的进展，现有方法仍存在准确率低的问题。我们识别出有效自动形式化的两大关键能力：对形式化语言领域知识的全面掌握，以及对自然语言问题理解与非正式-正式对齐的推理能力。缺乏前者，模型无法识别正确的形式化对象；缺失后者，则难以解读现实语境并将其精确映射为形式化表达。为弥补这些不足，我们提出了ThinkingF，一个数据合成与训练流程，旨在提升这两项能力。首先，我们构建了两个数据集：一个通过提炼和筛选富含形式化知识的大规模样例，另一个则依据专家设计的模板生成从非正式到正式的推理轨迹。随后，我们运用这些数据集进行监督微调（SFT）和强化学习与验证奖励（RLVR），以进一步融合和精炼这两项能力。由此得到的7B和32B模型展现出全面的形式化知识及强大的非正式到正式推理能力。特别地，StepFun-Formalizer-32B在FormalMATH-Lite和ProverBench上分别以40.5%和26.7%的BEq@1分数创下新纪录，超越了所有先前通用及专用模型的表现。

English

Autoformalization aims to translate natural-language mathematical statements into a formal language. While LLMs have accelerated progress in this area, existing methods still suffer from low accuracy. We identify two key abilities for effective autoformalization: comprehensive mastery of formal-language domain knowledge, and reasoning capability of natural language problem understanding and informal-formal alignment. Without the former, a model cannot identify the correct formal objects; without the latter, it struggles to interpret real-world contexts and map them precisely into formal expressions. To address these gaps, we introduce ThinkingF, a data synthesis and training pipeline that improves both abilities. First, we construct two datasets: one by distilling and selecting large-scale examples rich in formal knowledge, and another by generating informal-to-formal reasoning trajectories guided by expert-designed templates. We then apply SFT and RLVR with these datasets to further fuse and refine the two abilities. The resulting 7B and 32B models exhibit both comprehensive formal knowledge and strong informal-to-formal reasoning. Notably, StepFun-Formalizer-32B achieves SOTA BEq@1 scores of 40.5% on FormalMATH-Lite and 26.7% on ProverBench, surpassing all prior general-purpose and specialized models.

StepFun-Formalizer：通过知识推理融合释放大语言模型的自动形式化潜能

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion

摘要

Support