覆蓋原則：理解組合泛化的框架

摘要

大型語言模型擅長於模式匹配，但在系統性的組合泛化方面往往表現不足。我們提出了覆蓋原則：這是一個以數據為中心的框架，表明主要依賴模式匹配來完成組合任務的模型，無法可靠地泛化到在相同上下文中使用時能產生相同結果的片段替換之外。我們證明，這一框架對Transformer模型的泛化能力具有強大的預測力。首先，我們推導並實證確認，實現兩跳泛化所需的訓練數據量至少隨標記集大小的平方增長，且訓練數據效率不會因參數規模擴大20倍而提升。其次，對於存在路徑歧義的組合任務，即一個變量通過多條計算路徑影響輸出結果的情況，我們展示了Transformer學習到的上下文依賴狀態表示會削弱其性能與互操作性。第三，思維鏈監督雖能提升多跳任務的訓練數據效率，但仍難以應對路徑歧義問題。最後，我們構建了一種基於機制的分類法，區分了神經網絡實現泛化的三種方式：基於結構的（受覆蓋範圍限制）、基於屬性的（利用代數不變性）及共享運算符的（通過函數重用）。這一概念視角為我們的研究結果提供了背景，並指明了實現系統性組合性所需的新架構思路。總體而言，覆蓋原則為理解組合推理提供了一個統一的視角，並強調了要實現真正的系統性組合性，需要在架構或訓練方法上進行根本性的創新。

English

Large language models excel at pattern matching, yet often fall short in systematic compositional generalization. We propose the coverage principle: a data-centric framework showing that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts. We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers. First, we derive and empirically confirm that the training data required for two-hop generalization grows at least quadratically with the token set size, and the training data efficiency does not improve with 20x parameter scaling. Second, for compositional tasks with path ambiguity where one variable affects the output through multiple computational paths, we show that Transformers learn context-dependent state representations that undermine both performance and interoperability. Third, Chain-of-Thought supervision improves training data efficiency for multi-hop tasks but still struggles with path ambiguity. Finally, we outline a mechanism-based taxonomy that distinguishes three ways neural networks can generalize: structure-based (bounded by coverage), property-based (leveraging algebraic invariances), and shared-operator (through function reuse). This conceptual lens contextualizes our results and highlights where new architectural ideas are needed to achieve systematic compositionally. Overall, the coverage principle provides a unified lens for understanding compositional reasoning, and underscores the need for fundamental architectural or training innovations to achieve truly systematic compositionality.

覆蓋原則：理解組合泛化的框架

The Coverage Principle: A Framework for Understanding Compositional Generalization

摘要

Support