ChatPaper.aiChatPaper

REFLEX:基於風格與實質分離的可解釋事實核查自優化框架

REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

November 25, 2025
作者: Chuyi Kong, Gao Wei, Jing Ma, Hongzhan Lin, Yaxin Fan
cs.AI

摘要

社交媒体上错误信息的泛滥正侵蚀公众信任,亟需能够提供准确判定与可解释说明的自动化事实核查系统。然而现有基于大语言模型的方法过度依赖外部知识源,不仅引入显著延迟,甚至可能产生损害可靠性、可解释性与响应速度的幻觉现象,这对实时应用至关重要。为应对这些挑战,我们提出REason-guided Fact-checking with Latent EXplanations(REFLEX)范式,这种即插即用、自我优化的新范式通过挖掘骨干模型的内部知识,同步提升判定准确性与解释质量。REFLEX将事实核查重构为角色扮演对话,联合训练判定预测与解释生成模块。该方法自适应提取骨干模型与其微调变体间的对比激活对,构建能自然分离真相风格与实质内容的引导向量。这些激活层面的信号通过抑制噪声解释来引导推理,实现更忠实高效的推理过程。真实场景数据集实验表明,REFLEX优于传统单向真相引导方法,并揭示传统方法在处理事实核查中人类未知的微妙真相时面临的挑战。值得注意的是,仅需465个自我优化的训练样本,REFLEX即达到业界领先性能。此外,具备解释目标的模型能有效指导无此目标的模型,最高实现7.57%的性能提升,印证内部解释信号在阐释与增强事实推理方面的双重作用。
English
The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.
PDF01December 6, 2025