Transformer何时能够用抽象符号进行推理？

摘要

我们研究了变压器大型语言模型（LLMs）在涉及抽象符号的关系推理任务中的能力。这些任务长期以来一直是神经科学文献中的研究对象，被视为编程、数学和语言推理等更复杂能力的基本构建模块。对于（i）回归任务，我们证明变压器在训练时具有泛化能力，但需要大量训练数据。对于带有符号标签的（ii）下一个标记预测任务，我们展示了一种“反比例定律”：随着嵌入维度的增加，变压器无法泛化。对于（i）和（ii）这两种情景，我们提出了微妙的变压器修改，通过每个头部添加两个可训练参数来减少所需数据量。

English

We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

Transformer何时能够用抽象符号进行推理？

When can transformers reason with abstract symbols?

摘要

Support