幻觉排毒：用于大型语言模型训练的敏感神经元丢弃（SeND）

摘要

随着大型语言模型（LLMs）在各行业中的日益部署，人们对其可靠性的担忧日益增长，特别是由于幻觉——即事实不准确或与用户输入无关的输出。我们的研究调查了训练过程与幻觉出现之间的关系，以解决现有研究中的一个关键空白，该空白主要集中在事后检测和缓解策略上。我们使用Pythia套件中的模型（70M-12B参数）和几种幻觉检测指标，分析了训练过程中幻觉趋势，并探索了LLM内部动态。我们引入了一种名为敏感神经元随机失活（SeND）的新型训练协议，旨在通过在训练过程中减少方差来减轻幻觉。SeND通过在数据集上确定性地丢弃具有显著变异性的神经元，即敏感神经元，来实现这一目标。此外，我们开发了一种无监督的幻觉检测指标，称为高效特征值评分（EES），其速度是传统特征值评分的两倍。这种高效指标被整合到我们的协议中，使SeND在计算上既可扩展又能有效减少幻觉。我们的实证评估表明，与正常训练相比，我们的方法在测试时将LLM的可靠性提高了多达40％，同时提供了一种有效的方法，可以在将LLMs应用于维基百科和医学数据集等领域时提高事实准确性。

English

As large language models (LLMs) become increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations-outputs that are factually inaccurate or irrelevant to user input-have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M-12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce SEnsitive Neuron Dropout (SeND), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SeND achieves this by deterministically dropping neurons with significant variability on a dataset, referred to as Sensitive Neurons. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our protocol, allowing SeND to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to domains such as Wikipedia and Medical datasets.

幻觉排毒：用于大型语言模型训练的敏感神经元丢弃（SeND）

Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training

摘要

Support