見たものは見ないことにはできない：大規模言語モデルに対する知識の衝突の破壊的影響

要旨

大規模言語モデルは、タスクを遂行する際に文脈的入力とパラメトリック知識の両方に頻繁に依存する。しかし、これらの情報源はしばしば対立することがあり、特に検索された文書がモデルのパラメトリック知識と矛盾する場合に顕著である。本研究では、文脈的知識がモデルのパラメトリック信念と乖離する「文脈-記憶対立」状況下でのLLMの挙動を体系的に評価するための診断フレームワークを提案する。この対立を引き起こす診断データを構築し、複数のタスクタイプにわたるモデルの性能を分析した。その結果、(1) 知識の利用を必要としないタスクでは知識対立の影響が最小限であること、(2) 文脈的知識とパラメトリック知識が一致する場合にモデルの性能が一貫して高いこと、(3) 指示があってもモデルは内部知識を完全に抑制できないこと、(4) 対立を説明する根拠を提供することで文脈への依存度が高まること、が明らかとなった。これらの知見は、モデルベースの評価の妥当性に懸念を投げかけるとともに、LLMの実運用において知識対立を考慮する必要性を強調するものである。

English

Large language models frequently rely on both contextual input and parametric knowledge to perform tasks. However, these sources can come into conflict, especially when retrieved documents contradict the model's parametric knowledge. We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict, where the contextual information diverges from their parametric beliefs. We construct diagnostic data that elicit these conflicts and analyze model performance across multiple task types. Our findings reveal that (1) knowledge conflict has minimal impact on tasks that do not require knowledge utilization, (2) model performance is consistently higher when contextual and parametric knowledge are aligned, (3) models are unable to fully suppress their internal knowledge even when instructed, and (4) providing rationales that explain the conflict increases reliance on contexts. These insights raise concerns about the validity of model-based evaluation and underscore the need to account for knowledge conflict in the deployment of LLMs.

見たものは見ないことにはできない：大規模言語モデルに対する知識の衝突の破壊的影響

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

要旨

Support