REFIND：基於檢索增強的大型語言模型事實性幻覺檢測

摘要

大型語言模型（LLM）輸出中的幻覺嚴重限制了其在知識密集型任務（如問答）中的可靠性。為應對這一挑戰，我們引入了REFIND（基於檢索的事實性幻覺檢測），這是一個新穎的框架，通過直接利用檢索到的文檔來檢測LLM輸出中的幻覺片段。作為REFIND的一部分，我們提出了上下文敏感度比率（CSR），這是一種新穎的指標，用於量化LLM輸出對檢索證據的敏感性。這一創新方法使REFIND能夠高效且準確地檢測幻覺，使其有別於現有方法。在評估中，REFIND在九種語言（包括低資源環境）中展現了魯棒性，並顯著優於基準模型，在識別幻覺片段方面獲得了優異的IoU分數。這項工作凸顯了量化上下文敏感度在幻覺檢測中的有效性，從而為跨多種語言的更可靠、更值得信賴的LLM應用鋪平了道路。

English

Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval-augmented Factuality hallucINation Detection), a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents. As part of the REFIND, we propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. This innovative approach enables REFIND to efficiently and accurately detect hallucinations, setting it apart from existing methods. In the evaluation, REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models, achieving superior IoU scores in identifying hallucinated spans. This work highlights the effectiveness of quantifying context sensitivity for hallucination detection, thereby paving the way for more reliable and trustworthy LLM applications across diverse languages.

REFIND：基於檢索增強的大型語言模型事實性幻覺檢測

REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

摘要

Support