대규모 언어 모델의 일반화를 영향 함수를 통해 연구하기

초록

머신러닝 모델의 위험을 이해하고 완화하기 위해 모델의 내부를 더 잘 들여다보려 할 때, 잠재적으로 가치 있는 증거의 한 가지는 다음과 같습니다: 어떤 훈련 예제가 특정 행동에 가장 크게 기여하는가? 영향 함수(influence functions)는 다음과 같은 반사실적 질문에 답하고자 합니다: 주어진 시퀀스가 훈련 데이터셋에 추가된다면 모델의 파라미터(그리고 결과적으로 출력)는 어떻게 변할까? 영향 함수는 소규모 모델에 대한 통찰을 제공해왔지만, 역헤시안-벡터 곱(IHVP) 계산의 어려움으로 인해 대규모 언어 모델(LLM)로 확장하기가 어렵습니다. 우리는 Eigenvalue-corrected Kronecker-Factored Approximate Curvature(EK-FAC) 근사법을 사용해 최대 520억 개의 파라미터를 가진 LLM까지 영향 함수를 확장합니다. 실험에서 EK-FAC는 IHVP 계산이 기존 방법보다 수백 배 빠름에도 불구하고 전통적인 영향 함수 추정기와 유사한 정확도를 달성합니다. 우리는 후보 훈련 시퀀스의 그래디언트 계산 비용을 줄이기 위해 두 가지 알고리즘 기법(TF-IDF 필터링과 쿼리 배칭)을 탐구합니다. 영향 함수를 사용해 LLM의 일반화 패턴을 조사하며, 이는 영향 패턴의 희소성, 규모에 따른 추상화 증가, 수학 및 프로그래밍 능력, 교차 언어 일반화, 역할 수행 행동 등을 포함합니다. 많은 정교한 일반화 형태가 있음에도 불구하고, 우리는 놀라운 한계를 발견했습니다: 핵심 구문의 순서가 뒤바뀌면 영향이 거의 0으로 감소합니다. 전반적으로, 영향 함수는 LLM의 일반화 특성을 연구하는 강력한 새로운 도구를 제공합니다.

English

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

대규모 언어 모델의 일반화를 영향 함수를 통해 연구하기

Studying Large Language Model Generalization with Influence Functions

초록

Support