從RAGs到豐富參數：探究語言模型如何利用外部知識而非參數信息來回答事實查詢

摘要

檢索增強生成（RAG）豐富了語言模型利用外部上下文進行推理的能力，以增強對給定用戶提示的回應。這種方法因語言模型在搜索、問答和聊天機器人等各種應用中的實際應用而變得流行。然而，這種方法的確切工作方式尚不清楚。本文從機械角度檢驗了RAG流程，以突顯語言模型採取捷徑並強烈偏向僅利用上下文信息來回答問題，而對其參數記憶的依賴最小化。我們通過以下方式探究語言模型的這種機械行為：（i）因果中介分析顯示在回答問題時參數記憶被最小利用，以及（ii）注意貢獻和排除，顯示問題中的主題標記對最後標記殘差流沒有增強作用，但從上下文中其他信息豐富的標記中獲得增強。我們發現這種明顯的捷徑行為在LLaMa和Phi系列模型中都是真實存在的。

English

Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. We probe this mechanistic behavior in language models with: (i) Causal Mediation Analysis to show that the parametric memory is minimally utilized when answering a question and (ii) Attention Contributions and Knockouts to show that the last token residual stream do not get enriched from the subject token in the question, but gets enriched from other informative tokens in the context. We find this pronounced shortcut behaviour true across both LLaMa and Phi family of models.

從RAGs到豐富參數：探究語言模型如何利用外部知識而非參數信息來回答事實查詢

From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

摘要

Support