对分布转移下的上下文学习进行深入研究
A Closer Look at In-Context Learning under Distribution Shifts
May 26, 2023
作者: Kartik Ahuja, David Lopez-Paz
cs.AI
摘要
在上下文学习中,这是大型语言模型的一个关键特征,使模型能够在不需要权重更新的情况下即时从输入示例中学习。在这项工作中,我们遵循(Garg等,2022)提出的设置,以更好地了解在简单但基础的线性回归任务的视角下,上下文学习的普遍性和局限性。我们旨在探讨的关键问题是:在不同分布转移下,变压器是否比一些自然且更简单的架构更擅长执行上下文学习?为了比较变压器,我们建议使用基于基于集合的多层感知器(MLPs)的简单架构。我们发现,变压器和基于集合的MLPs在分布内评估下都表现出上下文学习的能力,但变压器更接近普通最小二乘法(OLS)的性能。变压器还表现出更好的抵抗轻微分布转移的能力,而基于集合的MLPs则表现不佳。然而,在严重的分布转移下,两种模型的上下文学习能力都会减弱。
English
In-context learning, a capability that enables a model to learn from input
examples on the fly without necessitating weight updates, is a defining
characteristic of large language models. In this work, we follow the setting
proposed in (Garg et al., 2022) to better understand the generality and
limitations of in-context learning from the lens of the simple yet fundamental
task of linear regression. The key question we aim to address is: Are
transformers more adept than some natural and simpler architectures at
performing in-context learning under varying distribution shifts? To compare
transformers, we propose to use a simple architecture based on set-based
Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based
MLPs exhibit in-context learning under in-distribution evaluations, but
transformers more closely emulate the performance of ordinary least squares
(OLS). Transformers also display better resilience to mild distribution shifts,
where set-based MLPs falter. However, under severe distribution shifts, both
models' in-context learning abilities diminish.