信念與命運：Transformer 在組合性上的限制

摘要

Transformer 大型語言模型（LLMs）以其在需要複雜多步推理的任務上表現出色而受到讚賞。然而，這些模型同時在一些看似微不足道的問題上展示出失敗。這引出了一個問題：這些錯誤是偶然的嗎，還是它們暗示了更為重大的限制？為了揭開Transformer的神秘面紗，我們研究了這些模型在三個代表性的組合任務上的極限 — 多位數乘法、邏輯網格謎題和一個經典的動態規劃問題。這些任務需要將問題分解為子步驟，並將這些步驟綜合成一個精確的答案。我們將組合任務定義為計算圖，以系統化地量化複雜性水平，並將推理步驟分解為中間子程序。我們的實證研究結果表明，Transformer通過將多步組合推理簡化為線性化子圖匹配來解決組合任務，而不一定發展出系統性的解決問題技能。為了結束我們的實證研究，我們提出了關於抽象多步推理問題的理論論點，突顯了Transformer的表現將隨著任務複雜度的增加而迅速下降。

English

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify Transformers, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how Transformers' performance will rapidly decay with increased task complexity.

信念與命運：Transformer 在組合性上的限制

Faith and Fate: Limits of Transformers on Compositionality

摘要

Support