ChatPaper.aiChatPaper

MR-Align:基于元推理的大规模推理模型事实性校准框架

MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

October 27, 2025
作者: Xinming Wang, Jian Xu, Bin Yu, Sheng Lian, Hongzhu Yi, Yi Chen, Yingjian Zhu, Boran Wang, Hongming Yang, Han Hu, Xu-Yao Zhang, Cheng-Lin Liu
cs.AI

摘要

大型推理模型(LRMs)在复杂推理任务中展现出强大能力,但在依赖证据的事实性问题上其边际收益有限。我们发现这种局限性部分源于"推理-答案命中鸿沟":模型在推理过程中识别出正确事实,却未能将其整合到最终回答中,从而降低了事实忠实度。为解决该问题,我们提出MR-ALIGN——一种基于元推理的对齐框架,无需依赖外部验证器即可提升事实准确性。该框架通过量化模型思考过程中的状态转移概率,构建具有转移感知的隐式奖励机制,在原子化思维片段层面强化有益推理模式并抑制缺陷模式。这种重加权策略将词元级信号转化为概率感知的片段评分,促使推理轨迹更连贯且更有利于事实准确性。在四个事实问答数据集和一项长文本事实性基准测试上的实证研究表明,MR-ALIGN能持续提升准确性与真实性,同时减少误导性推理。这些结果凸显了对推理过程本身(而非仅对输出结果)进行对齐,对于提升LRMs事实性具有关键意义。
English
Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited. We find this limitation is partially attributable to a reasoning-answer hit gap, where the model identifies the correct facts during reasoning but fails to incorporate them into the final response, thereby reducing factual fidelity. To address this issue, we propose MR-ALIGN, a Meta-Reasoning informed alignment framework that enhances factuality without relying on external verifiers. MR-ALIGN quantifies state transition probabilities along the model's thinking process and constructs a transition-aware implicit reward that reinforces beneficial reasoning patterns while suppressing defective ones at the atomic thinking segments. This re-weighting reshapes token-level signals into probability-aware segment scores, encouraging coherent reasoning trajectories that are more conducive to factual correctness. Empirical evaluations across four factual QA datasets and one long-form factuality benchmark show that MR-ALIGN consistently improves accuracy and truthfulness while reducing misleading reasoning. These results highlight that aligning the reasoning process itself, rather than merely the outputs, is pivotal for advancing factuality in LRMs.
PDF313January 19, 2026