Transformer何時能夠使用抽象符號進行推理？

摘要

我們研究了大型語言模型（LLMs）的變壓器在涉及抽象符號的關係推理任務上的能力。這些任務在神經科學文獻中長期以來一直被研究，被認為是編程、數學和語言推理等更複雜能力的基本構建模塊。對於（i）回歸任務，我們證明變壓器在訓練時可以泛化，但需要驚人數量的訓練數據。對於具有符號標籤的（ii）下一令牌預測任務，我們展示了一個“反比例定律”：隨著嵌入維度的增加，變壓器無法泛化。對於（i）和（ii）這兩種情況，我們提出了微妙的變壓器修改，通過每個注意力頭添加兩個可訓練參數來減少所需的數據量。

English

We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

Transformer何時能夠使用抽象符號進行推理？

When can transformers reason with abstract symbols?

摘要

Support