ANAH-v2:大型语言模型的分析性幻觉标注扩展
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
July 5, 2024
作者: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen
cs.AI
摘要
大型语言模型(LLMs)在各个领域和广泛应用的长篇问答任务中表现出幻觉。目前的幻觉检测和缓解数据集在领域和规模上受限,由于劳动成本高昂和现有幻觉标注者的可靠性不足而难以扩展。为了促进大规模监督LLM幻觉的发现,本文介绍了一种迭代自训练框架,同时逐步扩展幻觉注释数据集并提高幻觉标注器的准确性。基于期望最大化(EM)算法,在每次迭代中,该框架首先将幻觉注释流程应用于一个扩展数据集,然后在该数据集上训练一个更准确的幻觉标注器。这个新的幻觉标注器被用于下一次迭代中的幻觉注释流程。大量实验结果表明,最终获得的仅具有7B参数的幻觉标注器超越了GPT-4的性能,并通过零样本推理在HaluEval和HalluQA上获得了新的最先进幻觉检测结果。这样的标注器不仅可以评估大规模数据集上各种LLMs的幻觉水平,还可以帮助减轻LLMs生成的幻觉,使自然语言推理(NLI)指标在HaluEval上从25%提高到37%。
English
Large language models (LLMs) exhibit hallucinations in long-form
question-answering tasks across various domains and wide applications. Current
hallucination detection and mitigation datasets are limited in domains and
sizes, which struggle to scale due to prohibitive labor costs and insufficient
reliability of existing hallucination annotators. To facilitate the scalable
oversight of LLM hallucinations, this paper introduces an iterative
self-training framework that simultaneously and progressively scales up the
hallucination annotation dataset and improves the accuracy of the hallucination
annotator. Based on the Expectation Maximization (EM) algorithm, in each
iteration, the framework first applies a hallucination annotation pipeline to
annotate a scaled dataset and then trains a more accurate hallucination
annotator on the dataset. This new hallucination annotator is adopted in the
hallucination annotation pipeline used for the next iteration. Extensive
experimental results demonstrate that the finally obtained hallucination
annotator with only 7B parameters surpasses the performance of GPT-4 and
obtains new state-of-the-art hallucination detection results on HaluEval and
HalluQA by zero-shot inference. Such an annotator can not only evaluate the
hallucination levels of various LLMs on the large-scale dataset but also help
to mitigate the hallucination of LLMs generations, with the Natural Language
Inference (NLI) metric increasing from 25% to 37% on HaluEval.Summary
AI-Generated Summary