인-컨텍스트 러닝은 태스크 벡터를 생성한다.

초록

대규모 언어 모델(LLM)의 문맥 내 학습(In-context Learning, ICL)은 강력한 새로운 학습 패러다임으로 부상했습니다. 그러나 그 근본적인 메커니즘은 아직 잘 이해되지 않고 있습니다. 특히, 이를 "표준" 머신 러닝 프레임워크에 매핑하는 것은 어려운 과제입니다. 표준 프레임워크에서는 학습 세트 S를 사용하여 특정 가설 클래스 내에서 최적의 함수 f(x)를 찾습니다. 여기서 우리는 이 문제에 대한 진전을 이루어, ICL에 의해 학습된 함수들이 종종 매우 단순한 구조를 가진다는 것을 보여줍니다: 이 함수들은 쿼리 x와 학습 세트로부터 계산된 단일 "태스크 벡터"만을 입력으로 받는 트랜스포머 LLM에 해당합니다. 따라서 ICL은 S를 단일 태스크 벡터 theta(S)로 압축한 후, 이 태스크 벡터를 사용하여 트랜스포머를 조정하여 출력을 생성하는 것으로 볼 수 있습니다. 우리는 다양한 모델과 작업에 걸친 포괄적인 실험을 통해 위 주장을 뒷받침합니다.

English

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

인-컨텍스트 러닝은 태스크 벡터를 생성한다.

In-Context Learning Creates Task Vectors

초록

Support