ANAH-v2: 大規模言語モデルの分析的幻覚アノテーションのスケーリング

要旨

大規模言語モデル（LLM）は、さまざまなドメインや広範なアプリケーションにおける長文質問応答タスクで幻覚（hallucination）を示す。現在の幻覚検出および軽減データセットは、ドメインとサイズが限られており、膨大な労力コストと既存の幻覚アノテーターの信頼性不足により、スケーリングが困難である。LLMの幻覚をスケーラブルに監視するために、本論文では、幻覚アノテーションデータセットを段階的に拡大し、幻覚アノテーターの精度を向上させる反復的自己学習フレームワークを提案する。Expectation Maximization（EM）アルゴリズムに基づき、各反復において、まず幻覚アノテーションパイプラインを適用してスケールアップされたデータセットにアノテーションを行い、その後、そのデータセットでより正確な幻覚アノテーターを訓練する。この新しい幻覚アノテーターは、次の反復で使用される幻覚アノテーションパイプラインに採用される。大規模な実験結果により、最終的に得られた7Bパラメータの幻覚アノテーターがGPT-4の性能を上回り、HaluEvalとHalluQAにおいてゼロショット推論による新たな最先端の幻覚検出結果を達成することが示された。このアノテーターは、大規模データセット上でさまざまなLLMの幻覚レベルを評価するだけでなく、LLM生成の幻覚を軽減するのにも役立ち、HaluEvalにおけるNatural Language Inference（NLI）メトリックが25%から37%に向上した。

English

Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.

ANAH-v2: 大規模言語モデルの分析的幻覚アノテーションのスケーリング

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

要旨

Support