ANAH-v2:擴展大型語言模型的分析性幻覺標註
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
July 5, 2024
作者: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen
cs.AI
摘要
大型語言模型(LLMs)在各個領域和廣泛應用的長文問答任務中表現出幻覺。目前的幻覺檢測和緩解數據集在領域和大小上受限,由於勞動成本高昂和現有幻覺標註者的可靠性不足,很難擴展。為了促進對LLM幻覺的可擴展監督,本文介紹了一個迭代自我訓練框架,同時逐步擴大幻覺標註數據集的規模,並提高幻覺標註者的準確性。基於期望最大化(EM)算法,在每個迭代中,該框架首先應用一個幻覺標註流程對一個擴大的數據集進行標註,然後在該數據集上訓練一個更準確的幻覺標註者。這個新的幻覺標註者被採用在下一次迭代中使用的幻覺標註流程中。廣泛的實驗結果表明,最終獲得的僅具有7B參數的幻覺標註者超越了GPT-4的性能,在HaluEval和HalluQA上實現了零-shot推理的新的最先進幻覺檢測結果。這樣的標註者不僅可以評估大規模數據集上各種LLMs的幻覺水平,還可以幫助減輕LLMs生成的幻覺,使自然語言推理(NLI)指標從25%提高到37%。
English
Large language models (LLMs) exhibit hallucinations in long-form
question-answering tasks across various domains and wide applications. Current
hallucination detection and mitigation datasets are limited in domains and
sizes, which struggle to scale due to prohibitive labor costs and insufficient
reliability of existing hallucination annotators. To facilitate the scalable
oversight of LLM hallucinations, this paper introduces an iterative
self-training framework that simultaneously and progressively scales up the
hallucination annotation dataset and improves the accuracy of the hallucination
annotator. Based on the Expectation Maximization (EM) algorithm, in each
iteration, the framework first applies a hallucination annotation pipeline to
annotate a scaled dataset and then trains a more accurate hallucination
annotator on the dataset. This new hallucination annotator is adopted in the
hallucination annotation pipeline used for the next iteration. Extensive
experimental results demonstrate that the finally obtained hallucination
annotator with only 7B parameters surpasses the performance of GPT-4 and
obtains new state-of-the-art hallucination detection results on HaluEval and
HalluQA by zero-shot inference. Such an annotator can not only evaluate the
hallucination levels of various LLMs on the large-scale dataset but also help
to mitigate the hallucination of LLMs generations, with the Natural Language
Inference (NLI) metric increasing from 25% to 37% on HaluEval.Summary
AI-Generated Summary