突破自然推理的边界:形式逻辑验证带来的交错增益
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification
January 30, 2026
作者: Chuxue Cao, Jinluan Yang, Haoran Li, Kunhao Pan, Zijian Zhao, Zhengyu Chen, Yuchen Tian, Lijun Wu, Conghui He, Sirui Han, Yike Guo
cs.AI
摘要
大型语言模型(LLMs)展现出卓越的能力,但其基于随机性下一词元预测的特性会导致逻辑不一致和奖励破解问题,而形式符号系统可规避此类缺陷。为弥合这一差距,我们提出一种形式逻辑验证引导的框架,将形式符号验证与自然语言生成过程动态交织,通过实时反馈在错误发生时进行检测与修正。与以往受限于被动事后验证的神经符号方法不同,我们的方法能主动惩罚推理链中的中间谬误。我们通过创新的两阶段训练流程实现该框架,协同整合形式逻辑验证引导的监督微调与策略优化。在涵盖数学、逻辑和通用推理的六个基准测试上的广泛评估表明,我们的7B和14B模型分别以10.4%和14.2%的平均优势超越现有最优基线。这些结果验证了形式验证可作为可扩展机制,显著推动先进LLM推理的性能边界。
English
Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.