被拖入衝突:在搜索增強型大語言模型中檢測與處理衝突來源
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
June 10, 2025
作者: Arie Cattan, Alon Jacovi, Ori Ram, Jonathan Herzig, Roee Aharoni, Sasha Goldshtein, Eran Ofek, Idan Szpektor, Avi Caciularu
cs.AI
摘要
檢索增強生成(Retrieval Augmented Generation, RAG)是一種常用於提升大型語言模型(LLMs)獲取相關且最新資訊能力的方法。然而,檢索到的來源往往可能包含相互矛盾的資訊,而模型應如何處理此類分歧仍不明確。在本研究中,我們首先提出了一種新穎的RAG知識衝突類型分類法,並針對每種類型闡述了模型應有的行為模式。隨後,我們引入了CONFLICTS,這是一個在現實RAG場景下,由專家對衝突類型進行註解的高質量基準測試集。CONFLICTS是首個能夠追蹤模型在處理廣泛知識衝突方面進展的基準測試集。我們在該基準上進行了廣泛的實驗,結果顯示LLMs在恰當解決來源間衝突方面常遇困難。儘管提示LLMs明確推理檢索文檔中潛在的衝突顯著提高了其回應的質量與適切性,但未來研究仍有大幅改進的空間。
English
Retrieval Augmented Generation (RAG) is a commonly used approach for
enhancing large language models (LLMs) with relevant and up-to-date
information. However, the retrieved sources can often contain conflicting
information and it remains unclear how models should address such
discrepancies. In this work, we first propose a novel taxonomy of knowledge
conflict types in RAG, along with the desired model behavior for each type. We
then introduce CONFLICTS, a high-quality benchmark with expert annotations of
conflict types in a realistic RAG setting. CONFLICTS is the first benchmark
that enables tracking progress on how models address a wide range of knowledge
conflicts. We conduct extensive experiments on this benchmark, showing that
LLMs often struggle to appropriately resolve conflicts between sources. While
prompting LLMs to explicitly reason about the potential conflict in the
retrieved documents significantly improves the quality and appropriateness of
their responses, substantial room for improvement in future research remains.