对分布转移下的上下文学习进行深入研究

摘要

在上下文学习中，这是大型语言模型的一个关键特征，使模型能够在不需要权重更新的情况下即时从输入示例中学习。在这项工作中，我们遵循(Garg等，2022)提出的设置，以更好地了解在简单但基础的线性回归任务的视角下，上下文学习的普遍性和局限性。我们旨在探讨的关键问题是：在不同分布转移下，变压器是否比一些自然且更简单的架构更擅长执行上下文学习？为了比较变压器，我们建议使用基于基于集合的多层感知器（MLPs）的简单架构。我们发现，变压器和基于集合的MLPs在分布内评估下都表现出上下文学习的能力，但变压器更接近普通最小二乘法（OLS）的性能。变压器还表现出更好的抵抗轻微分布转移的能力，而基于集合的MLPs则表现不佳。然而，在严重的分布转移下，两种模型的上下文学习能力都会减弱。

English

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

对分布转移下的上下文学习进行深入研究

A Closer Look at In-Context Learning under Distribution Shifts

摘要

Support