STRIDE: 部分集合摂動からのスパース復元による訓練データ帰属

要旨

訓練データ帰属（TDA）は、モデルの予測を訓練データにまで遡って追跡することを目的とする。TDAの黄金基準は因果的介入に依拠し、データの追加や削除時にモデルがどのように変化するかを観察するが、大規模言語モデル（LLM）にとって繰り返しの再学習は計算負荷が高い。そのため、ほとんどの手法では勾配を用いてパラメータ空間におけるこの効果を近似する。しかし、数十億のパラメータにわたる勾配の追跡は、法外なコストがかかるだけでなく、局所近似に依存する。本研究では、パラメータ変化の推定ではなく、活性化空間における訓練データの機能的効果をモデル化するという転換を提案する。我々は、STRIDE（Steering-based Training Data Influence Decomposition）を導入する。これは、TDAを圧縮センシングの精神に基づくスパース復元問題として定式化するフレームワークである。STRIDEは、データサブセットでの訓練によって引き起こされる振る舞いの変化を模倣する軽量な「ステアリング演算子」を学習する。これらの演算子がテスト予測をどのように摂動させるかを測定することで、スパース線形分解を介して個々の訓練例の影響を復元する。STRIDEは、LLM事前学習の帰属において最先端の性能を達成しつつ、従来手法よりも一桁（13倍）高速である。さらに、データ選択、データ汚染、質的分析を含む下流アプリケーションを通じて、その実用的有用性を検証する。

English

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLMs). Consequently, most approaches approximate this effect in the parameter space using gradients. However, tracking gradients across billions of parameters is not only prohibitively expensive but relies on local approximations. In this work, we propose a shift: rather than estimating parameter changes, we model the functional effect of training data in the activation space. We introduce STRIDE (Steering-based Training Data Influence Decomposition), a framework that formulates TDA as a sparse recovery problem in the spirit of compressive sensing. STRIDE learns lightweight "steering operators" that mimic the behavioral shift caused by training on data subsets. By measuring how these operators perturb test predictions, we recover individual training example influences via sparse linear decomposition. STRIDE achieves state-of-the-art for LLM pre-training attribution while being an order of magnitude (13times) faster than previous art. We further validate its practical utility through downstream applications including data selection, data contamination, and qualitative analysis.