STRIDE: 부분집합 교란으로부터의 희소 복원을 통한 학습 데이터 귀인

초록

훈련 데이터 귀속(TDA)은 모델의 예측 결과를 학습 데이터로 거슬러 추적하는 것을 목표로 한다. TDA의 최적 기준은 인과적 개입에 의존하며, 데이터가 추가되거나 제거될 때 모델이 어떻게 변화하는지 관찰하는 것이지만, 반복적인 재학습은 대규모 언어 모델(LLM)에 대해 계산적으로 매우 까다롭다. 결과적으로 대부분의 접근 방식은 그래디언트를 사용하여 매개변수 공간에서 이러한 효과를 근사한다. 그러나 수십억 개의 매개변수에 걸친 그래디언트 추적은 엄청난 비용이 들 뿐만 아니라 국소적 근사에 의존한다는 한계가 있다. 본 연구에서는 매개변수 변화를 추정하는 대신 활성화 공간에서 훈련 데이터의 기능적 효과를 모델링하는 전환을 제안한다. 우리는 STRIDE(Steering-based Training Data Influence Decomposition) 프레임워크를 소개한다. 이는 압축 센싱의 정신에 따라 TDA를 희소 복원 문제로 정식화한다. STRIDE는 데이터 하위 집합에 대한 훈련으로 인해 발생하는 행동 변화를 모방하는 가벼운 "조향 연산자"를 학습한다. 이 연산자들이 테스트 예측을 어떻게 교란하는지 측정함으로써, 희소 선형 분해를 통해 개별 훈련 예제의 영향력을 복원한다. STRIDE는 LLM 사전 학습 귀속에 대해 최첨단 성능을 달성하면서도 기존 기법보다 13배 더 빠르다. 또한 데이터 선택, 데이터 오염, 정성적 분석을 포함한 하위 응용을 통해 실제 유용성을 추가로 검증한다.

English

Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLMs). Consequently, most approaches approximate this effect in the parameter space using gradients. However, tracking gradients across billions of parameters is not only prohibitively expensive but relies on local approximations. In this work, we propose a shift: rather than estimating parameter changes, we model the functional effect of training data in the activation space. We introduce STRIDE (Steering-based Training Data Influence Decomposition), a framework that formulates TDA as a sparse recovery problem in the spirit of compressive sensing. STRIDE learns lightweight "steering operators" that mimic the behavioral shift caused by training on data subsets. By measuring how these operators perturb test predictions, we recover individual training example influences via sparse linear decomposition. STRIDE achieves state-of-the-art for LLM pre-training attribution while being an order of magnitude (13times) faster than previous art. We further validate its practical utility through downstream applications including data selection, data contamination, and qualitative analysis.