REFLEX：通过将真相解构为风格与实质实现可自我优化的可解释事实核查

摘要

社交媒体上错误信息的泛滥正侵蚀公众信任，亟需能够提供准确判定与可解释说明的自动化事实核查系统。然而现有基于大语言模型的方法过度依赖外部知识源，不仅引入显著延迟，甚至会产生损害可靠性、可解释性与实时响应性的幻觉内容。为解决这些挑战，我们提出基于隐式解释的推理引导事实核查范式REFLEX。这一即插即用的自优化范式通过挖掘骨干模型内部知识，同步提升判定准确性与解释质量。REFLEX将事实核查重构为角色扮演对话，联合训练判定预测与解释生成模块。该方法自适应提取骨干模型与其微调变体间的对比激活对，构建能自然分离真相风格与实质的导向向量。这些激活层面的信号通过抑制噪声解释引导推理过程，实现更忠实高效的推理。在真实数据集上的实验表明，REFLEX优于传统单向真相导向方法，并揭示传统方法在处理事实核查中人类未知的微妙真相时面临的挑战。值得注意的是，仅需465个自优化训练样本，REFLEX即达到最先进性能。此外，具备解释目标的模型能有效指导无解释功能的模型，实现最高7.57%的性能提升，印证内部解释信号在阐释与增强事实推理方面具有双重作用。

English

The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.

REFLEX：通过将真相解构为风格与实质实现可自我优化的可解释事实核查

REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

摘要

Support