LLMを用いた文脈に基づく幻覚の細粒度検出

要旨

コンテキストに基づく幻覚（context-grounded hallucinations）とは、モデルの出力がソーステキストに対して検証不可能な情報を含む事例を指す。本研究では、既存の複雑な評価パイプラインに代わる実用的な代替手段として、大規模言語モデル（LLMs）がそのような幻覚を特定するための適用可能性を検討する。幻覚の特定に関するメタ評価のための確立されたベンチマークが存在しない状況において、我々はLLMsに特化したベンチマークを構築し、1,000以上の事例に対する挑戦的な人間によるアノテーションを行った。このベンチマークを補完するため、LLMベースの評価プロトコルを提案し、人間による評価を通じてその品質を検証した。既存の幻覚の表現形式では表現可能なエラーの種類が限られているため、我々は自由形式のテキスト記述に基づく新しい表現形式を提案し、可能な限りのエラーの範囲を捕捉する。4つの大規模LLMを評価する包括的な研究を行い、最良のモデルでもF1スコアが0.67に留まるなど、ベンチマークの難易度の高さを明らかにした。詳細な分析を通じて、このタスクにおける最適なプロンプト戦略に関する洞察を提供し、LLMsにとって課題となる主な要因を特定した：（1）出力内の事実のみを確認するよう指示されているにもかかわらず、欠落した詳細を誤って不一致と判定する傾向、および（2）ソースに存在せず検証不可能であるが、モデルのパラメトリック知識に整合する事実上正しい情報を含む出力に対する困難さである。

English

Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark's difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model's parametric knowledge.

LLMを用いた文脈に基づく幻覚の細粒度検出

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

要旨

Support