無所不在:LLM 可以在上下文中以叠加方式學習多個任務
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
October 8, 2024
作者: Zheyang Xiong, Ziyang Cai, John Cooper, Albert Ge, Vasilis Papageorgiou, Zack Sifakis, Angeliki Giannou, Ziqian Lin, Liu Yang, Saurabh Agarwal, Grigorios G Chrysos, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
cs.AI
摘要
大型語言模型(LLMs)展示了卓越的上下文學習(ICL)能力。在本研究中,我們探索了與ICL相關的一個令人驚訝的現象:LLMs能夠在單個推論調用期間同時執行多個計算上不同的ICL任務,這種能力我們稱之為“任務重疊”。我們提供了跨不同LLM家族和規模的實證證據,並展示了即使我們訓練模型一次只學習一個任務,這種現象也會出現。我們提供了理論解釋,認為這種能力完全在transformers的表達能力範圍之內。我們還探討了LLMs在重疊期間如何內部組合任務向量。此外,我們展示了更大的模型可以並行解決更多ICL任務,並更好地校準其輸出分佈。我們的研究結果揭示了LLMs的潛在能力,進一步證實了“LLMs作為模擬器重疊”的觀點,並引發了關於實現同時執行任務的機制的問題。
English
Large Language Models (LLMs) have demonstrated remarkable in-context learning
(ICL) capabilities. In this study, we explore a surprising phenomenon related
to ICL: LLMs can perform multiple, computationally distinct ICL tasks
simultaneously, during a single inference call, a capability we term "task
superposition". We provide empirical evidence of this phenomenon across various
LLM families and scales and show that this phenomenon emerges even if we train
the model to in-context learn one task at a time. We offer theoretical
explanations that this capability is well within the expressive power of
transformers. We also explore how LLMs internally compose task vectors during
superposition. Furthermore, we show that larger models can solve more ICL tasks
in parallel, and better calibrate their output distribution. Our findings offer
insights into the latent capabilities of LLMs, further substantiate the
perspective of "LLMs as superposition of simulators", and raise questions about
the mechanisms enabling simultaneous task execution.Summary
AI-Generated Summary