所見即難忘：知識衝突對大型語言模型的顛覆性影響

摘要

大型語言模型在執行任務時，通常依賴於上下文輸入和參數化知識。然而，這兩種來源可能會產生衝突，尤其是在檢索到的文件與模型的參數化知識相矛盾時。我們提出了一個診斷框架，用於系統性地評估在上下文記憶衝突下（即上下文信息與其參數化信念相悖時）的LLM行為。我們構建了引發這些衝突的診斷數據，並分析了模型在多種任務類型中的表現。研究結果顯示：(1) 知識衝突對不需要利用知識的任務影響甚微，(2) 當上下文與參數化知識一致時，模型表現始終更佳，(3) 即使被指示，模型也無法完全抑制其內部知識，(4) 提供解釋衝突的推理會增加對上下文的依賴。這些發現引發了對基於模型評估有效性的擔憂，並強調了在部署LLM時需考慮知識衝突的必要性。

English

Large language models frequently rely on both contextual input and parametric knowledge to perform tasks. However, these sources can come into conflict, especially when retrieved documents contradict the model's parametric knowledge. We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict, where the contextual information diverges from their parametric beliefs. We construct diagnostic data that elicit these conflicts and analyze model performance across multiple task types. Our findings reveal that (1) knowledge conflict has minimal impact on tasks that do not require knowledge utilization, (2) model performance is consistently higher when contextual and parametric knowledge are aligned, (3) models are unable to fully suppress their internal knowledge even when instructed, and (4) providing rationales that explain the conflict increases reliance on contexts. These insights raise concerns about the validity of model-based evaluation and underscore the need to account for knowledge conflict in the deployment of LLMs.

所見即難忘：知識衝突對大型語言模型的顛覆性影響

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

摘要

Support