被拖入衝突：在搜索增強型大語言模型中檢測與處理衝突來源

摘要

檢索增強生成（Retrieval Augmented Generation, RAG）是一種常用於提升大型語言模型（LLMs）獲取相關且最新資訊能力的方法。然而，檢索到的來源往往可能包含相互矛盾的資訊，而模型應如何處理此類分歧仍不明確。在本研究中，我們首先提出了一種新穎的RAG知識衝突類型分類法，並針對每種類型闡述了模型應有的行為模式。隨後，我們引入了CONFLICTS，這是一個在現實RAG場景下，由專家對衝突類型進行註解的高質量基準測試集。CONFLICTS是首個能夠追蹤模型在處理廣泛知識衝突方面進展的基準測試集。我們在該基準上進行了廣泛的實驗，結果顯示LLMs在恰當解決來源間衝突方面常遇困難。儘管提示LLMs明確推理檢索文檔中潛在的衝突顯著提高了其回應的質量與適切性，但未來研究仍有大幅改進的空間。

English

Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.

被拖入衝突：在搜索增強型大語言模型中檢測與處理衝突來源

DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

摘要

Support