MR-Align：面向大型推理模型的元推理引导事实性校准

摘要

大型推理模型（LRMs）在复杂推理任务中展现出强大能力，但在依赖证据的事实性问题上的边际收益有限。我们发现这种局限性部分归因于"推理-答案命中鸿沟"——模型在推理过程中能识别正确事实，却未能将其整合到最终回答中，从而降低了事实保真度。为解决该问题，我们提出MR-ALIGN框架，这是一种基于元推理的对齐机制，无需依赖外部验证器即可增强事实准确性。该框架通过量化模型思维过程中的状态转移概率，构建具有转移感知能力的隐式奖励机制，在原子化思维片段层面强化有益推理模式并抑制缺陷模式。这种重加权策略将词元级信号转化为概率感知的片段评分，促使推理轨迹保持连贯性，从而更有利于实现事实正确性。在四个事实问答数据集和一个长文本事实性基准测试上的实证表明，MR-ALIGN能持续提升准确性与真实性，同时减少误导性推理。这些结果凸显出：对齐推理过程本身（而非仅仅对齐输出结果）对于提升LRMs的事实性具有关键意义。

English

Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited. We find this limitation is partially attributable to a reasoning-answer hit gap, where the model identifies the correct facts during reasoning but fails to incorporate them into the final response, thereby reducing factual fidelity. To address this issue, we propose MR-ALIGN, a Meta-Reasoning informed alignment framework that enhances factuality without relying on external verifiers. MR-ALIGN quantifies state transition probabilities along the model's thinking process and constructs a transition-aware implicit reward that reinforces beneficial reasoning patterns while suppressing defective ones at the atomic thinking segments. This re-weighting reshapes token-level signals into probability-aware segment scores, encouraging coherent reasoning trajectories that are more conducive to factual correctness. Empirical evaluations across four factual QA datasets and one long-form factuality benchmark show that MR-ALIGN consistently improves accuracy and truthfulness while reducing misleading reasoning. These results highlight that aligning the reasoning process itself, rather than merely the outputs, is pivotal for advancing factuality in LRMs.

MR-Align：面向大型推理模型的元推理引导事实性校准

MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

摘要

Support