何时信任上下文:面向上下文可靠性的自我反思式辩论
When to Trust Context: Self-Reflective Debates for Context Reliability
June 6, 2025
作者: Zeqi Zhou, Fang Wu, Shayan Talaei, Haokai Zhao, Cheng Meixin, Tinson Xu, Amin Saberi, Yejin Choi
cs.AI
摘要
大型语言模型常面临参数化知识与上下文输入之间的冲突,往往导致事实不一致或产生幻觉。我们提出了面向上下文可靠性的自反思辩论框架(SR-DCR),这一轻量级框架通过整合令牌级自置信度与非对称多智能体辩论来裁决此类冲突。其中,一位缺乏上下文的批评者挑战一位基于给定段落进行辩护的捍卫者;一位裁判模型评估辩论并判定上下文的可靠性。最终答案结合裁判结果与模型置信度进行选择。在ClashEval基准测试上的实验表明,SR-DCR在保持对可信输入准确性的同时,持续增强了对误导性上下文的鲁棒性,以最小的计算开销超越了传统辩论和仅依赖置信度的基线方法。代码已发布于https://github.com/smiles724/Self-Reflective-Debates。
English
Large language models frequently encounter conflicts between their parametric
knowledge and contextual input, often resulting in factual inconsistencies or
hallucinations. We propose Self-Reflective Debate for Contextual Reliability
(SR-DCR), a lightweight framework that integrates token-level self-confidence
with an asymmetric multi-agent debate to adjudicate such conflicts. A critic,
deprived of context, challenges a defender who argues from the given passage; a
judge model evaluates the debate and determines the context's reliability. The
final answer is selected by combining the verdict with model confidence.
Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently
enhances robustness to misleading context while maintaining accuracy on
trustworthy inputs, outperforming both classical debate and confidence-only
baselines with minimal computational overhead. The code is available at
https://github.com/smiles724/Self-Reflective-Debates.