信念与命运：Transformer在组合性上的局限

摘要

Transformer大型语言模型（LLMs）以其在需要复杂多步推理的任务上表现出色而受到赞誉。然而，这些模型同时在一些看似琐碎的问题上显示出失败。这引发了一个问题：这些错误是偶然的，还是表明了更重大的局限性？为了揭开Transformer的神秘面纱，我们研究了这些模型在三个代表性的组合任务中的极限--多位数乘法、逻辑格子谜题和一个经典的动态规划问题。这些任务需要将问题分解为子步骤，并将这些步骤综合成一个精确的答案。我们将组合任务制定为计算图，以系统化地量化复杂性水平，并将推理步骤分解为中间子过程。我们的实证研究结果表明，Transformer通过将多步组合推理简化为线性化子图匹配来解决组合任务，而不一定发展系统化的问题解决技能。为了完成我们的实证研究，我们提出了关于抽象多步推理问题的理论论证，强调Transformer的性能将随着任务复杂性的增加而迅速下降。

English

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify Transformers, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how Transformers' performance will rapidly decay with increased task complexity.

信念与命运：Transformer在组合性上的局限

Faith and Fate: Limits of Transformers on Compositionality

摘要

Support