大規模言語モデルの汎化性能を影響関数を用いて研究する

要旨

機械学習モデルの可視性を向上させ、関連するリスクを理解し軽減するために、重要な情報源となる可能性があるのは、どのトレーニング事例が特定の挙動に最も寄与しているかという点です。影響関数（influence functions）は、ある反事実的な問いに答えることを目指しています：もし特定のシーケンスがトレーニングセットに追加された場合、モデルのパラメータ（そしてその出力）はどのように変化するか？影響関数は小規模なモデルに対して洞察を提供してきましたが、逆ヘッセ行列ベクトル積（IHVP）の計算が困難であるため、大規模言語モデル（LLM）にスケールアップするのは難しいとされています。本研究では、Eigenvalue-corrected Kronecker-Factored Approximate Curvature（EK-FAC）近似を用いて、最大520億パラメータのLLMまで影響関数をスケールアップします。実験では、EK-FACは従来の影響関数推定器と同等の精度を達成しつつ、IHVPの計算速度が桁違いに高速です。候補トレーニングシーケンスの勾配計算コストを削減するために、TF-IDFフィルタリングとクエリバッチ処理という2つのアルゴリズム技術を検討します。影響関数を用いて、LLMの汎化パターンを調査し、影響パターンのスパース性、スケールに伴う抽象化の増加、数学およびプログラミング能力、クロスリンガル汎化、ロールプレイ行動などを分析します。多くの一見洗練された汎化形式があるにもかかわらず、驚くべき限界を特定しました：キーフレーズの順序が反転すると、影響がほぼゼロに減衰するのです。全体として、影響関数はLLMの汎化特性を研究するための強力な新たなツールを提供します。

English

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

大規模言語モデルの汎化性能を影響関数を用いて研究する

Studying Large Language Model Generalization with Influence Functions

要旨

Support