LLM을 활용한 문맥 기반 환각 현상의 세밀한 탐지

초록

컨텍스트 기반 환각(context-grounded hallucinations)은 모델 출력이 소스 텍스트에 대해 검증할 수 없는 정보를 포함하는 경우를 말합니다. 우리는 기존의 복잡한 평가 파이프라인에 대한 더 실용적인 대안으로, 이러한 환각 현상을 지역화(localizing)하기 위해 대규모 언어 모델(LLMs)의 적용 가능성을 연구합니다. 환각 현상 지역화에 대한 메타 평가를 위한 확립된 벤치마크가 부재한 상황에서, 우리는 LLMs에 맞춤화된 벤치마크를 구축하였으며, 이는 1,000개 이상의 예시에 대한 도전적인 인간 주석 작업을 포함합니다. 우리는 이 벤치마크를 LLM 기반 평가 프로토콜로 보완하고, 인간 평가를 통해 그 품질을 검증합니다. 기존의 환각 현상 표현 방식은 표현 가능한 오류 유형을 제한하기 때문에, 우리는 가능한 모든 오류 범위를 포착할 수 있는 자유 형식의 텍스트 기반 설명을 기반으로 한 새로운 표현 방식을 제안합니다. 우리는 네 가지 대규모 LLM을 평가하는 포괄적인 연구를 수행하며, 이 벤치마크의 난이도를 강조합니다. 최고의 모델도 F1 점수가 0.67에 그쳤기 때문입니다. 신중한 분석을 통해, 우리는 이 작업에 대한 최적의 프롬프트 전략에 대한 통찰을 제공하고, LLM에게 도전적인 주요 요인을 식별합니다: (1) 출력에서 사실만 확인하도록 지시받았음에도 불구하고, 누락된 세부사항을 일관성 없음으로 잘못 표시하는 경향; 그리고 (2) 소스 텍스트에 없지만 모델의 파라미터적 지식과 일치하여 사실적으로는 정확한 정보를 포함하는 출력을 다루는 데 어려움.

English

Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark's difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model's parametric knowledge.

LLM을 활용한 문맥 기반 환각 현상의 세밀한 탐지

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

초록

Support