STRIDE:通過從子集擾動中進行稀疏恢復的訓練數據歸屬
STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
June 3, 2026
作者: Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye, Amirali Abdullah, Bernhard Schölkopf, Zhijing Jin
cs.AI
摘要
训练数据归因(TDA)旨在将模型的预测结果追溯至其训练数据。TDA的黄金标准依赖因果干预,通过观察数据增删时模型的改变来推断影响,但对大型语言模型(LLM)而言,反复重训练在计算上极具挑战。因此,现有方法大多通过梯度在参数空间中近似这一效应。然而,追踪数十亿参数的梯度不仅计算成本高昂,还依赖于局部近似。本文提出一种范式转变:我们不估计参数变化,而是在激活空间中建模训练数据的功能性效应。我们提出STRIDE(基于引导的训练数据影响分解框架),该框架将TDA形式化为压缩感知框架下的稀疏恢复问题。STRIDE学习轻量级的"引导算子",用以模拟训练数据子集导致的行为偏移。通过测量这些算子如何扰动测试预测结果,我们利用稀疏线性分解恢复单个训练样本的影响。STRIDE在LLM预训练归因任务上达到当前最优水平,同时速度较现有方法提升一个数量级(13倍)。我们进一步通过下游应用验证其实用价值,包括数据筛选、数据污染检测及定性分析。
English
Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLMs). Consequently, most approaches approximate this effect in the parameter space using gradients. However, tracking gradients across billions of parameters is not only prohibitively expensive but relies on local approximations. In this work, we propose a shift: rather than estimating parameter changes, we model the functional effect of training data in the activation space. We introduce STRIDE (Steering-based Training Data Influence Decomposition), a framework that formulates TDA as a sparse recovery problem in the spirit of compressive sensing. STRIDE learns lightweight "steering operators" that mimic the behavioral shift caused by training on data subsets. By measuring how these operators perturb test predictions, we recover individual training example influences via sparse linear decomposition. STRIDE achieves state-of-the-art for LLM pre-training attribution while being an order of magnitude (13times) faster than previous art. We further validate its practical utility through downstream applications including data selection, data contamination, and qualitative analysis.