基于大语言模型的上下文关联幻觉细粒度检测
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
September 26, 2025
作者: Yehonatan Peisakhovsky, Zorik Gekhman, Yosi Mass, Liat Ein-Dor, Roi Reichart
cs.AI
摘要
基于上下文的幻觉现象指的是模型输出中包含无法从源文本验证的信息。我们研究了大型语言模型(LLMs)在定位此类幻觉上的适用性,作为现有复杂评估流程的一种更实用的替代方案。鉴于目前缺乏用于幻觉定位元评估的基准测试,我们构建了一个专门针对LLMs的基准,涉及对1000多个示例进行具有挑战性的人工标注。我们为该基准补充了一套基于LLM的评估协议,并通过人工评估验证了其质量。由于现有的幻觉表示方式限制了可表达的错误类型,我们提出了一种基于自由文本描述的新表示方法,以捕捉所有可能的错误范围。我们开展了一项全面研究,评估了四个大规模LLM,结果显示该基准难度较高,最佳模型的F1分数仅为0.67。通过细致分析,我们为任务提供了最优提示策略的洞见,并识别出使LLM面临挑战的主要因素:(1)模型倾向于错误地将缺失细节标记为不一致,尽管已指示其仅检查输出中的事实;(2)对于包含源文本中未出现、因而无法验证但符合模型参数知识的正确信息的输出,模型处理起来存在困难。
English
Context-grounded hallucinations are cases where model outputs contain
information not verifiable against the source text. We study the applicability
of LLMs for localizing such hallucinations, as a more practical alternative to
existing complex evaluation pipelines. In the absence of established benchmarks
for meta-evaluation of hallucinations localization, we construct one tailored
to LLMs, involving a challenging human annotation of over 1,000 examples. We
complement the benchmark with an LLM-based evaluation protocol, verifying its
quality in a human evaluation. Since existing representations of hallucinations
limit the types of errors that can be expressed, we propose a new
representation based on free-form textual descriptions, capturing the full
range of possible errors. We conduct a comprehensive study, evaluating four
large-scale LLMs, which highlights the benchmark's difficulty, as the best
model achieves an F1 score of only 0.67. Through careful analysis, we offer
insights into optimal prompting strategies for the task and identify the main
factors that make it challenging for LLMs: (1) a tendency to incorrectly flag
missing details as inconsistent, despite being instructed to check only facts
in the output; and (2) difficulty with outputs containing factually correct
information absent from the source - and thus not verifiable - due to alignment
with the model's parametric knowledge.