ホールシネーションデトックス：大規模言語モデルのトレーニングのための感度ニューロンドロップアウト（SeND）

要旨

大規模言語モデル（LLM）がさまざまな産業でますます展開されるにつれて、特にユーザー入力と事実に合致しないあるいは関連性のない出力である幻覚に関する信頼性に関する懸念が高まっています。当研究では、既存研究が主に事後検出と緩和戦略に焦点を当てている既存研究の主要なギャップを解消するために、訓練プロセスと幻覚の発生との関係を調査しています。Pythiaスイートのモデル（70M-12Bパラメータ）といくつかの幻覚検出メトリクスを使用して、訓練中の幻覚の傾向を分析し、LLMの内部ダイナミクスを探求します。私たちは、幻覚を軽減するために訓練中の分散を減らすために設計された新しいトレーニングプロトコルであるSensitive Neuron Dropout（SeND）を導入します。SeNDは、データセット上で有意な変動性を持つニューロン、Sensitive Neuronsと呼ばれるニューロンを確定的にドロップすることでこれを達成します。さらに、従来のEigenScoreを2倍の速度で近似する効率的な未監督幻覚検出メトリックであるEfficient EigenScore（EES）を開発します。この効率的なメトリックは、SeNDが計算的にスケーラブルでありながら幻覚を軽減するのに効果的であるように、当プロトコルに統合されています。私たちの経験的評価は、通常のトレーニングに比べてテスト時のLLMの信頼性を最大40％向上させると同時に、Wikipediaや医療データセットなどの領域にLLMを適応させる際の事実の精度を向上させる効率的な手法を提供していることを示しています。

English

As large language models (LLMs) become increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations-outputs that are factually inaccurate or irrelevant to user input-have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M-12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce SEnsitive Neuron Dropout (SeND), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SeND achieves this by deterministically dropping neurons with significant variability on a dataset, referred to as Sensitive Neurons. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our protocol, allowing SeND to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to domains such as Wikipedia and Medical datasets.

ホールシネーションデトックス：大規模言語モデルのトレーニングのための感度ニューロンドロップアウト（SeND）

Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training

要旨

Support