コンテキストを信頼するタイミング：コンテキスト信頼性のための自己省察的議論

要旨

大規模言語モデルは、そのパラメトリックな知識と文脈的入力との間にしばしば矛盾を生じ、事実の不整合や虚構（ハルシネーション）を引き起こすことが多い。本論文では、文脈的信頼性のための自己反省的議論（Self-Reflective Debate for Contextual Reliability, SR-DCR）を提案する。これは、トークンレベルの自己信頼度と非対称的なマルチエージェント議論を統合した軽量フレームワークであり、そのような矛盾を裁定するものである。文脈を遮断された批評家が、与えられた文章に基づいて主張する防御者に挑戦し、裁判官モデルが議論を評価して文脈の信頼性を判断する。最終的な回答は、判定結果とモデルの信頼度を組み合わせて選択される。ClashEvalベンチマークでの実験により、SR-DCRが誤解を招く文脈に対する頑健性を一貫して向上させつつ、信頼できる入力に対する精度を維持し、従来の議論や信頼度のみに基づくベースラインを計算オーバーヘッドを最小限に抑えながら上回ることが示された。コードはhttps://github.com/smiles724/Self-Reflective-Debatesで公開されている。

English

Large language models frequently encounter conflicts between their parametric knowledge and contextual input, often resulting in factual inconsistencies or hallucinations. We propose Self-Reflective Debate for Contextual Reliability (SR-DCR), a lightweight framework that integrates token-level self-confidence with an asymmetric multi-agent debate to adjudicate such conflicts. A critic, deprived of context, challenges a defender who argues from the given passage; a judge model evaluates the debate and determines the context's reliability. The final answer is selected by combining the verdict with model confidence. Experiments on the ClashEval benchmark demonstrate that SR-DCR consistently enhances robustness to misleading context while maintaining accuracy on trustworthy inputs, outperforming both classical debate and confidence-only baselines with minimal computational overhead. The code is available at https://github.com/smiles724/Self-Reflective-Debates.

コンテキストを信頼するタイミング：コンテキスト信頼性のための自己省察的議論

When to Trust Context: Self-Reflective Debates for Context Reliability

要旨

Support