无处不在的一切：LLMs 可以在上下文中以叠加的方式学习多个任务

摘要

大型语言模型（LLMs）展示了出色的上下文学习（ICL）能力。在这项研究中，我们探讨了与ICL相关的一个令人惊讶的现象：LLMs可以在单个推理调用期间同时执行多个计算上不同的ICL任务，这种能力被我们称为“任务叠加”。我们提供了跨不同LLM系列和规模的实证证据，表明即使我们训练模型一次只学习一个任务，这种现象也会出现。我们提供了理论解释，认为这种能力完全在transformer的表达能力范围内。我们还探讨了LLMs在叠加期间如何内部组合任务向量。此外，我们展示了更大的模型可以并行解决更多ICL任务，并更好地校准其输出分布。我们的发现揭示了LLMs的潜在能力，进一步证实了“LLMs作为模拟器叠加”的观点，并引发了关于使任务同时执行的机制的问题。

English

Large Language Models (LLMs) have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

无处不在的一切：LLMs 可以在上下文中以叠加的方式学习多个任务

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

摘要

Support