利用大型語言模型進行基於上下文的細粒度幻覺檢測

摘要

情境性幻觉是指模型输出中包含无法根据源文本验证的信息的情况。我们研究了大型语言模型（LLMs）在定位此类幻觉方面的适用性，作为现有复杂评估流程的一种更实用的替代方案。由于缺乏用于幻觉定位元评估的既定基准，我们构建了一个专门针对LLMs的基准，涉及对超过1,000个示例进行具有挑战性的人工标注。我们通过一个基于LLM的评估协议来补充该基准，并通过人工评估验证其质量。鉴于现有的幻觉表示方法限制了可表达的错误类型，我们提出了一种基于自由形式文本描述的新表示方法，以捕捉所有可能的错误范围。我们进行了一项全面的研究，评估了四个大规模LLM，结果表明该基准具有相当难度，最佳模型的F1得分仅为0.67。通过细致分析，我们为任务提供了最优提示策略的见解，并识别出使LLM面临挑战的主要因素：（1）模型倾向于错误地将缺失的细节标记为不一致，尽管已指示其仅检查输出中的事实；（2）模型在处理包含源文本中未出现但模型参数知识中存在的正确信息时存在困难，这些信息因此无法验证。

English

Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark's difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model's parametric knowledge.

利用大型語言模型進行基於上下文的細粒度幻覺檢測

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

摘要

Support