信仰と運命：Transformerモデルの構成性における限界

要旨

Transformerベースの大規模言語モデル（LLM）は、複雑な多段階の推論を必要とするタスクにおいて卓越した性能を示し、賞賛を集めています。しかし、これらのモデルは同時に、驚くほど単純な問題で失敗することもあります。これは、これらのエラーが偶発的なものなのか、それともより根本的な限界を示しているのかという疑問を投げかけます。Transformerの謎を解明するため、私たちは3つの代表的な構成タスク——多桁の乗算、論理グリッドパズル、古典的な動的計画法の問題——において、これらのモデルの限界を調査しました。これらのタスクは、問題をサブステップに分解し、それらのステップを統合して正確な答えを導き出すことを要求します。私たちは構成タスクを計算グラフとして定式化し、複雑さのレベルを体系的に定量化し、推論ステップを中間的なサブプロシージャに分解しました。私たちの実証的な研究結果は、Transformerが多段階の構成推論を線形化されたサブグラフマッチングに還元することで構成タスクを解決し、必ずしも体系的な問題解決スキルを発展させていないことを示唆しています。実証研究を締めくくるために、私たちは抽象的な多段階推論問題に関する理論的な議論を提供し、タスクの複雑さが増すにつれてTransformerの性能が急速に低下することを強調します。

English

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify Transformers, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how Transformers' performance will rapidly decay with increased task complexity.

信仰と運命：Transformerモデルの構成性における限界

Faith and Fate: Limits of Transformers on Compositionality

要旨

Support