深入研究分布轉移下的內文學習

摘要

在上下文學習中，這種能力使模型能夠在不需要權重更新的情況下即時從輸入示例中學習，這是大型語言模型的一個明確特徵。在這項工作中，我們遵循(Garg等，2022)提出的設置，以更好地了解從線性回歸這個簡單但基本任務的角度來看上下文學習的普遍性和限制。我們旨在探討的關鍵問題是：在不同分布變化下，變壓器是否比某些自然且更簡單的架構更擅長執行上下文學習？為了比較變壓器，我們建議使用基於基於集合的多層感知器（MLP）的簡單架構。我們發現，變壓器和基於集合的MLP在分布內評估下都表現出上下文學習的能力，但變壓器更接近普通最小二乘法（OLS）的性能。變壓器在輕微分布變化下也表現出更好的韌性，而基於集合的MLP則表現不佳。然而，在嚴重的分布變化下，兩種模型的上下文學習能力都會下降。

English

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

深入研究分布轉移下的內文學習

A Closer Look at In-Context Learning under Distribution Shifts

摘要

Support