新しいデータがLLMの知識にどのように浸透し、それを希釈する方法

要旨

大規模言語モデルは勾配ベースの更新を蓄積することで学習し、継続的に学習を進めますが、新しい情報の個々の断片が既存の知識にどのような影響を与え、有益な汎化と問題のある幻覚の両方を引き起こすかについては、まだ十分に理解されていません。私たちは、新しい情報を学習する際に、LLMが「プライミング」効果を示すことを実証しました。新しい事実を学習すると、モデルがその知識を無関係な文脈に不適切に適用してしまうのです。この現象を体系的に研究するために、私たちは「Outlandish」という、1320の多様なテキストサンプルを慎重に選定したデータセットを導入しました。このデータセットを使用して、新しい知識がLLMの既存の知識ベースにどのように浸透するかを探ります。このデータセットを用いて、新しい情報を学習した後のプライミングの程度は、学習前のキーワードのトークン確率を測定することで予測できることを示しました。この関係は、異なるモデルアーキテクチャ（PALM-2、Gemma、Llama）、サイズ、および学習段階にわたって堅牢に成り立ちます。最後に、新しい知識が既存のモデルの挙動にどのように影響するかを調整するための2つの新しい技術を開発しました：（1）「ステッピングストーン」テキスト拡張戦略と（2）「ignore-k」更新剪定法です。これらのアプローチにより、望ましくないプライミング効果を50〜95％削減しつつ、モデルが新しい情報を学習する能力を維持します。私たちの研究結果は、LLMがどのように学習するかについての実証的な洞察を提供するとともに、言語モデルへの知識挿入の特異性を向上させるための実用的なツールを提供します。詳細な資料はこちら：https://sunchipsster1.github.io/projects/outlandish/

English

Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. To systematically study this phenomenon, we introduce "Outlandish," a carefully curated dataset of 1320 diverse text samples designed to probe how new knowledge permeates through an LLM's existing knowledge base. Using this dataset, we show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning. This relationship holds robustly across different model architectures (PALM-2, Gemma, Llama), sizes, and training stages. Finally, we develop two novel techniques to modulate how new knowledge affects existing model behavior: (1) a ``stepping-stone'' text augmentation strategy and (2) an ``ignore-k'' update pruning method. These approaches reduce undesirable priming effects by 50-95\% while preserving the model's ability to learn new information. Our findings provide both empirical insights into how LLMs learn and practical tools for improving the specificity of knowledge insertion in language models. Further materials: https://sunchipsster1.github.io/projects/outlandish/

新しいデータがLLMの知識にどのように浸透し、それを希釈する方法

How new data permeates LLM knowledge and how to dilute it

要旨

Support