ChatPaper.aiChatPaper

何时信任上下文:针对上下文可靠性的自我反思式辩论

When to Trust Context: Self-Reflective Debates for Context Reliability

June 6, 2025
作者: Zeqi Zhou, Fang Wu, Shayan Talaei, Haokai Zhao, Cheng Meixin, Tinson Xu, Amin Saberi, Yejin Choi
cs.AI

摘要

大型語言模型常面臨其參數化知識與上下文輸入之間的衝突,這往往導致事實不一致或虛構現象。我們提出了一種輕量級框架——基於上下文可靠性的自我反思辯論(SR-DCR),該框架將令牌級別的自我信心與非對稱多代理辯論相結合,以裁決此類衝突。一位缺乏上下文的批評者挑戰一位基於給定段落進行辯護的辯護者;一位法官模型評估辯論並判定上下文的可靠性。最終答案的選取結合了裁決結果與模型信心。在ClashEval基準測試上的實驗表明,SR-DCR在保持對可信輸入準確性的同時,持續增強了對誤導性上下文的魯棒性,以最小的計算開銷超越了傳統辯論及僅依賴信心的基線方法。相關代碼已公開於https://github.com/smiles724/Self-Reflective-Debates。
English
Large language models frequently encounter conflicts between their parametric knowledge and contextual input, often resulting in factual inconsistencies or hallucinations. We propose Self-Reflective Debate for Contextual Reliability (SR-DCR), a lightweight framework that integrates token-level self-confidence with an asymmetric multi-agent debate to adjudicate such conflicts. A critic, deprived of context, challenges a defender who argues from the given passage; a judge model evaluates the debate and determines the context's reliability. The final answer is selected by combining the verdict with model confidence. Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently enhances robustness to misleading context while maintaining accuracy on trustworthy inputs, outperforming both classical debate and confidence-only baselines with minimal computational overhead. The code is available at https://github.com/smiles724/Self-Reflective-Debates.
PDF12June 12, 2025