トランスフォーマーはいつ抽象記号を推論できるのか？

要旨

我々は、抽象的な記号を含む関係推論タスクにおけるトランスフォーマー大規模言語モデル（LLM）の能力を調査する。このようなタスクは、プログラミング、数学、言語推論におけるより複雑な能力の基本的な構成要素として、神経科学の文献で長年研究されてきた。(i)回帰タスクにおいて、トランスフォーマーは訓練時に一般化するが、驚くほど大量の訓練データを必要とすることを証明する。(ii)記号ラベルを用いた次トークン予測タスクにおいて、トランスフォーマーは埋め込み次元が増加するにつれて一般化に失敗するという「逆スケーリング則」を示す。設定(i)と(ii)の両方において、ヘッドごとに2つの学習可能なパラメータを追加することで、必要なデータ量を削減できる微妙なトランスフォーマーの修正を提案する。

English

We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

トランスフォーマーはいつ抽象記号を推論できるのか？

When can transformers reason with abstract symbols?

要旨

Support