信念与命运:Transformer在组合性上的局限
Faith and Fate: Limits of Transformers on Compositionality
May 29, 2023
作者: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jian, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi
cs.AI
摘要
Transformer大型语言模型(LLMs)以其在需要复杂多步推理的任务上表现出色而受到赞誉。然而,这些模型同时在一些看似琐碎的问题上显示出失败。这引发了一个问题:这些错误是偶然的,还是表明了更重大的局限性?为了揭开Transformer的神秘面纱,我们研究了这些模型在三个代表性的组合任务中的极限--多位数乘法、逻辑格子谜题和一个经典的动态规划问题。这些任务需要将问题分解为子步骤,并将这些步骤综合成一个精确的答案。我们将组合任务制定为计算图,以系统化地量化复杂性水平,并将推理步骤分解为中间子过程。我们的实证研究结果表明,Transformer通过将多步组合推理简化为线性化子图匹配来解决组合任务,而不一定发展系统化的问题解决技能。为了完成我们的实证研究,我们提出了关于抽象多步推理问题的理论论证,强调Transformer的性能将随着任务复杂性的增加而迅速下降。
English
Transformer large language models (LLMs) have sparked admiration for their
exceptional performance on tasks that demand intricate multi-step reasoning.
Yet, these models simultaneously show failures on surprisingly trivial
problems. This begs the question: Are these errors incidental, or do they
signal more substantial limitations? In an attempt to demystify Transformers,
we investigate the limits of these models across three representative
compositional tasks -- multi-digit multiplication, logic grid puzzles, and a
classic dynamic programming problem. These tasks require breaking problems down
into sub-steps and synthesizing these steps into a precise answer. We formulate
compositional tasks as computation graphs to systematically quantify the level
of complexity, and break down reasoning steps into intermediate sub-procedures.
Our empirical findings suggest that Transformers solve compositional tasks by
reducing multi-step compositional reasoning into linearized subgraph matching,
without necessarily developing systematic problem-solving skills. To round off
our empirical study, we provide theoretical arguments on abstract multi-step
reasoning problems that highlight how Transformers' performance will rapidly
decay with increased task complexity.