カバレッジ原理：合成的汎化を理解するためのフレームワーク

要旨

大規模言語モデルはパターンマッチングに優れているが、体系的な合成的汎化においてはしばしば限界を示す。本論文では、カバレッジ原理を提案する。これは、合成的タスクにおいて主にパターンマッチングに依存するモデルが、同じ文脈で使用された場合に同一の結果をもたらす断片の置換を超えて確実に汎化できないことを示すデータ中心のフレームワークである。このフレームワークがTransformerの汎化能力に対して強い予測力を持つことを実証する。まず、2ホップ汎化に必要なトレーニングデータがトークンセットサイズに対して少なくとも二次関数的に増加し、20倍のパラメータスケーリングでもトレーニングデータ効率が改善しないことを導出し、実証的に確認する。次に、1つの変数が複数の計算経路を通じて出力に影響を与える経路曖昧性を持つ合成的タスクにおいて、Transformerが文脈依存の状態表現を学習し、性能と相互運用性の両方を損なうことを示す。第三に、Chain-of-Thought監視がマルチホップタスクのトレーニングデータ効率を向上させるが、依然として経路曖昧性に苦戦することを示す。最後に、ニューラルネットワークが汎化する3つの方法を区別するメカニズムベースの分類法を概説する。構造ベース（カバレッジに制約される）、特性ベース（代数的不変性を活用する）、共有オペレータ（関数の再利用を通じて）である。この概念的レンズは我々の結果を文脈化し、体系的な合成的汎化を達成するために新しいアーキテクチャのアイデアが必要な領域を強調する。全体として、カバレッジ原理は合成的推論を理解するための統一的な視点を提供し、真に体系的な合成的汎化を達成するための根本的なアーキテクチャまたはトレーニングの革新の必要性を強調する。

English

Large language models excel at pattern matching, yet often fall short in systematic compositional generalization. We propose the coverage principle: a data-centric framework showing that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts. We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers. First, we derive and empirically confirm that the training data required for two-hop generalization grows at least quadratically with the token set size, and the training data efficiency does not improve with 20x parameter scaling. Second, for compositional tasks with path ambiguity where one variable affects the output through multiple computational paths, we show that Transformers learn context-dependent state representations that undermine both performance and interoperability. Third, Chain-of-Thought supervision improves training data efficiency for multi-hop tasks but still struggles with path ambiguity. Finally, we outline a mechanism-based taxonomy that distinguishes three ways neural networks can generalize: structure-based (bounded by coverage), property-based (leveraging algebraic invariances), and shared-operator (through function reuse). This conceptual lens contextualizes our results and highlights where new architectural ideas are needed to achieve systematic compositionally. Overall, the coverage principle provides a unified lens for understanding compositional reasoning, and underscores the need for fundamental architectural or training innovations to achieve truly systematic compositionality.

カバレッジ原理：合成的汎化を理解するためのフレームワーク

The Coverage Principle: A Framework for Understanding Compositional Generalization

要旨

Support