無所不在：LLM 可以在上下文中以叠加方式學習多個任務

摘要

大型語言模型（LLMs）展示了卓越的上下文學習（ICL）能力。在本研究中，我們探索了與ICL相關的一個令人驚訝的現象：LLMs能夠在單個推論調用期間同時執行多個計算上不同的ICL任務，這種能力我們稱之為“任務重疊”。我們提供了跨不同LLM家族和規模的實證證據，並展示了即使我們訓練模型一次只學習一個任務，這種現象也會出現。我們提供了理論解釋，認為這種能力完全在transformers的表達能力範圍之內。我們還探討了LLMs在重疊期間如何內部組合任務向量。此外，我們展示了更大的模型可以並行解決更多ICL任務，並更好地校準其輸出分佈。我們的研究結果揭示了LLMs的潛在能力，進一步證實了“LLMs作為模擬器重疊”的觀點，並引發了關於實現同時執行任務的機制的問題。

English

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

無所不在：LLM 可以在上下文中以叠加方式學習多個任務

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

摘要

Support