DRAGged into Conflicts: 검색 강화 LLM에서 상충하는 소스 탐지 및 해결

초록

검색 증강 생성(Retrieval Augmented Generation, RAG)은 대규모 언어 모델(LLM)에 관련성 있고 최신 정보를 제공하기 위해 널리 사용되는 접근법입니다. 그러나 검색된 소스들은 종종 상충되는 정보를 포함하고 있으며, 모델이 이러한 불일치를 어떻게 처리해야 하는지는 여전히 명확하지 않습니다. 본 연구에서는 먼저 RAG에서 발생하는 지식 충돌 유형에 대한 새로운 분류 체계와 각 유형에 대한 모델의 바람직한 행동을 제안합니다. 이어서, 현실적인 RAG 설정에서 전문가가 주석을 단 충돌 유형을 포함한 고품질 벤치마크인 CONFLICTS를 소개합니다. CONFLICTS는 모델이 다양한 지식 충돌을 어떻게 처리하는지에 대한 진전을 추적할 수 있는 최초의 벤치마크입니다. 이 벤치마크를 통해 광범위한 실험을 수행한 결과, LLM이 소스 간의 충돌을 적절히 해결하는 데 종종 어려움을 겪는 것으로 나타났습니다. 검색된 문서에서 잠재적 충돌에 대해 명시적으로 추론하도록 LLM을 프롬프팅하면 응답의 질과 적절성이 크게 향상되지만, 향후 연구에서 개선할 여지가 여전히 상당합니다.

English

Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.

DRAGged into Conflicts: 검색 강화 LLM에서 상충하는 소스 탐지 및 해결

DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

초록

Support