分布シフト下におけるインコンテキスト学習の詳細な考察

要旨

文脈内学習（In-context learning）は、重みの更新を必要とせずに入力例からその場で学習する能力であり、大規模言語モデルの特徴的な性質です。本研究では、(Garg et al., 2022) で提案された設定に従い、線形回帰というシンプルでありながら基本的なタスクを通じて、文脈内学習の汎用性と限界をより深く理解することを目指します。私たちが取り組む主要な問いは、分布シフトが生じる状況下で、Transformerが自然でよりシンプルなアーキテクチャよりも文脈内学習を得意としているかどうかです。Transformerを比較するために、集合ベースの多層パーセプトロン（MLP）に基づくシンプルなアーキテクチャを提案します。その結果、Transformerと集合ベースMLPの両方が、分布内評価において文脈内学習を示すものの、Transformerは通常の最小二乗法（OLS）の性能により近い結果を示すことがわかりました。また、Transformerは、軽度の分布シフトに対してより強靭性を示し、集合ベースMLPが苦戦する場面でも良好な性能を維持します。しかし、深刻な分布シフトが生じると、両モデルの文脈内学習能力は低下します。

English

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

分布シフト下におけるインコンテキスト学習の詳細な考察

A Closer Look at In-Context Learning under Distribution Shifts

要旨

Support