분포 변화 하에서의 인-컨텍스트 학습에 대한 심층 분석

초록

컨텍스트 내 학습(in-context learning)은 모델이 가중치 업데이트 없이도 입력 예제를 즉석에서 학습할 수 있게 하는 능력으로, 대규모 언어 모델의 정의적 특성 중 하나이다. 본 연구에서는 (Garg et al., 2022)에서 제안된 설정을 따라 선형 회귀라는 단순하지만 근본적인 작업의 관점에서 컨텍스트 내 학습의 일반성과 한계를 더 깊이 이해하고자 한다. 우리가 다루고자 하는 핵심 질문은 다음과 같다: 다양한 분포 변화 하에서 트랜스포머가 자연스럽고 단순한 아키텍처보다 컨텍스트 내 학습을 더 잘 수행하는가? 이를 비교하기 위해, 우리는 집합 기반 다층 퍼셉트론(Multi-Layer Perceptrons, MLPs)을 기반으로 한 단순한 아키텍처를 제안한다. 연구 결과, 트랜스포머와 집합 기반 MLP 모두 분포 내 평가에서 컨텍스트 내 학습을 보여주었으나, 트랜스포머가 일반 최소 제곱법(ordinary least squares, OLS)의 성능을 더 가깝게 모방했다. 또한 트랜스포머는 약한 분포 변화에서 더 나은 회복력을 보였으며, 집합 기반 MLP는 이에 취약했다. 그러나 심각한 분포 변화에서는 두 모델의 컨텍스트 내 학습 능력 모두 감소했다.

English

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

분포 변화 하에서의 인-컨텍스트 학습에 대한 심층 분석

A Closer Look at In-Context Learning under Distribution Shifts

초록

Support