신뢰와 운명: 트랜스포머의 조합성에 대한 한계

초록

트랜스포머(Transformer) 기반 대규모 언어 모델(LLMs)은 복잡한 다단계 추론을 요구하는 작업에서 탁월한 성능을 보이며 주목을 받고 있습니다. 그러나 동시에 이 모델들은 놀랍도록 단순한 문제에서 실패를 보이기도 합니다. 이는 이러한 오류가 단순한 우연인지, 아니면 더 근본적인 한계를 나타내는 것인지에 대한 의문을 제기합니다. 트랜스포머의 한계를 명확히 이해하기 위해, 우리는 세 가지 대표적인 조합적 작업(compositional tasks) — 다자릿수 곱셈, 논리 그리드 퍼즐, 그리고 고전적인 동적 프로그래밍 문제 — 에 걸쳐 이 모델들의 한계를 조사했습니다. 이러한 작업들은 문제를 하위 단계로 분해하고 이를 정확한 답으로 종합하는 능력을 요구합니다. 우리는 조합적 작업을 계산 그래프(computation graph)로 공식화하여 복잡성 수준을 체계적으로 정량화하고, 추론 단계를 중간 하위 절차로 분해했습니다. 실험 결과에 따르면, 트랜스포머는 체계적인 문제 해결 능력을 반드시 발전시키지 않고도 다단계 조합적 추론을 선형화된 하위 그래프 매칭으로 축소하여 조합적 작업을 해결하는 것으로 나타났습니다. 실험 연구를 마무리하며, 우리는 추상적인 다단계 추론 문제에 대한 이론적 논의를 제시하여, 작업 복잡성이 증가함에 따라 트랜스포머의 성능이 급격히 저하될 수 있음을 강조합니다.

English

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify Transformers, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how Transformers' performance will rapidly decay with increased task complexity.

신뢰와 운명: 트랜스포머의 조합성에 대한 한계

Faith and Fate: Limits of Transformers on Compositionality

초록

Support