LettuceDetect:面向RAG應用的幻覺檢測框架
LettuceDetect: A Hallucination Detection Framework for RAG Applications
February 24, 2025
作者: Ádám Kovács, Gábor Recski
cs.AI
摘要
儘管檢索增強生成(RAG)系統整合了外部知識來源,仍易產生虛構答案。我們提出LettuceDetect框架,旨在解決現有幻覺檢測方法的兩大關鍵限制:(1)傳統基於編碼器方法的上下文窗口限制,以及(2)基於大型語言模型(LLM)方法的計算效率低下。基於ModernBERT擴展上下文能力(最高可達8k個標記)並在RAGTruth基準數據集上訓練,我們的方法超越了所有先前的基於編碼器模型及大多數基於提示的模型,同時模型規模約為最佳模型的1/30。LettuceDetect是一個處理上下文-問題-答案三元組的標記分類模型,能夠在標記層面識別無支持的主張。在RAGTruth語料庫上的評估顯示,其範例級檢測的F1分數達79.22%,較之前基於編碼器的最先進架構Luna提升了14.8%。此外,該系統在單個GPU上每秒可處理30至60個範例,使其更適合實際的RAG應用場景。
English
Retrieval Augmented Generation (RAG) systems remain vulnerable to
hallucinated answers despite incorporating external knowledge sources. We
present LettuceDetect a framework that addresses two critical limitations in
existing hallucination detection methods: (1) the context window constraints of
traditional encoder-based methods, and (2) the computational inefficiency of
LLM based approaches. Building on ModernBERT's extended context capabilities
(up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach
outperforms all previous encoder-based models and most prompt-based models,
while being approximately 30 times smaller than the best models. LettuceDetect
is a token-classification model that processes context-question-answer triples,
allowing for the identification of unsupported claims at the token level.
Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for
example-level detection, which is a 14.8% improvement over Luna, the previous
state-of-the-art encoder-based architecture. Additionally, the system can
process 30 to 60 examples per second on a single GPU, making it more practical
for real-world RAG applications.Summary
AI-Generated Summary