ChatPaper.aiChatPaper

ConflictBank:一個用於評估LLM中知識衝突影響的基準測試

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

August 22, 2024
作者: Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng
cs.AI

摘要

大型語言模型(LLMs)在許多學科中取得了令人印象深刻的進展,然而知識衝突這一重要問題,作為幻覺的主要來源,卻鮮少受到研究。只有少數研究探討了LLMs固有知識與檢索到的上下文知識之間的衝突。然而,對LLMs中知識衝突的全面評估仍然缺乏。受到這一研究空白的激勵,我們提出ConflictBank,這是第一個全面的基準,旨在系統地評估來自三個方面的知識衝突:(i)在檢索知識中遇到的衝突,(ii)模型編碼知識內部的衝突,以及(iii)這些衝突形式之間的相互作用。我們的研究深入探討了四個模型系列和十二個LLM實例,細緻分析了由於錯誤信息、時間差異和語義分歧而產生的衝突。基於我們提出的新型構建框架,我們創建了7,453,853個主張-證據對和553,117個問答對。我們提出了關於模型規模、衝突原因和衝突類型的眾多發現。我們希望我們的ConflictBank基準能幫助社群更好地理解模型在衝突中的行為,並開發出更可靠的LLMs。
English
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.

Summary

AI-Generated Summary

PDF121November 16, 2024