모든 것 어디서나 모든 것 한꺼번에: LLM은 상황 속에서 여러 작업을 중첩하여 학습할 수 있습니다.

초록

대형 언어 모델(Large Language Models, LLMs)은 맥락 내 학습(In-Context Learning, ICL) 능력에서 놀라운 성과를 보여주었습니다. 본 연구에서는 ICL과 관련된 놀라운 현상을 탐구합니다: LLMs는 "작업 중첩(task superposition)"이라는 능력을 통해 단일 추론 호출 동안 여러 가지 계산적으로 구분된 ICL 작업을 동시에 수행할 수 있습니다. 우리는 이 현상에 대한 경험적 증거를 다양한 LLM 계열 및 규모에서 제시하고, 이 현상이 모델을 맥락 내에서 한 번에 하나의 작업을 학습하도록 훈련해도 나타난다는 것을 보여줍니다. 우리는 transformers의 표현 능력 내에 이 능력이 잘 들어맞는다는 이론적 설명을 제공합니다. 또한 LLMs가 작업 중첩 중에 작업 벡터를 내부적으로 어떻게 구성하는지 탐구합니다. 게다가, 우리는 더 큰 모델이 병렬로 더 많은 ICL 작업을 해결하고 출력 분포를 더 잘 보정할 수 있다는 것을 보여줍니다. 우리의 연구 결과는 LLMs의 잠재 능력에 대한 통찰을 제공하며, "LLMs를 시뮬레이터의 중첩"으로 본 관점을 더욱 강화하며, 동시 작업 실행을 가능케 하는 메커니즘에 대한 의문을 제기합니다.

English

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

모든 것 어디서나 모든 것 한꺼번에: LLM은 상황 속에서 여러 작업을 중첩하여 학습할 수 있습니다.

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

초록

Support