ChatPaper.aiChatPaper

覆盖原则:理解组合泛化的框架

The Coverage Principle: A Framework for Understanding Compositional Generalization

May 26, 2025
作者: Hoyeon Chang, Jinho Park, Hanseul Cho, Sohee Yang, Miyoung Ko, Hyeonbin Hwang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo
cs.AI

摘要

大型语言模型在模式匹配方面表现出色,但在系统性组合泛化上往往力有未逮。我们提出了覆盖原则:一个以数据为中心的框架,表明主要依赖模式匹配进行组合任务的模型无法可靠地泛化到在相同上下文中使用能产生相同结果的片段替换之外。我们证明该框架对Transformer的泛化能力具有强大的预测力。首先,我们推导并实证验证了双跳泛化所需的训练数据量至少随标记集大小呈二次方增长,且训练数据效率不会随参数规模扩大20倍而提升。其次,对于存在路径歧义的组合任务,即一个变量通过多条计算路径影响输出的情况,我们展示了Transformer学习到的上下文依赖状态表示会削弱性能与互操作性。第三,思维链监督虽能提高多跳任务的训练数据效率,但仍难以应对路径歧义问题。最后,我们构建了一个基于机制的分类体系,区分了神经网络实现泛化的三种方式:基于结构的(受限于覆盖范围)、基于属性的(利用代数不变性)和共享操作符的(通过函数复用)。这一概念视角为我们的研究结果提供了背景,并指明了实现系统性组合性所需的新架构思路。总体而言,覆盖原则为理解组合推理提供了一个统一的视角,并强调了实现真正系统性组合性所需的基础架构或训练方法的创新。
English
Large language models excel at pattern matching, yet often fall short in systematic compositional generalization. We propose the coverage principle: a data-centric framework showing that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts. We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers. First, we derive and empirically confirm that the training data required for two-hop generalization grows at least quadratically with the token set size, and the training data efficiency does not improve with 20x parameter scaling. Second, for compositional tasks with path ambiguity where one variable affects the output through multiple computational paths, we show that Transformers learn context-dependent state representations that undermine both performance and interoperability. Third, Chain-of-Thought supervision improves training data efficiency for multi-hop tasks but still struggles with path ambiguity. Finally, we outline a mechanism-based taxonomy that distinguishes three ways neural networks can generalize: structure-based (bounded by coverage), property-based (leveraging algebraic invariances), and shared-operator (through function reuse). This conceptual lens contextualizes our results and highlights where new architectural ideas are needed to achieve systematic compositionally. Overall, the coverage principle provides a unified lens for understanding compositional reasoning, and underscores the need for fundamental architectural or training innovations to achieve truly systematic compositionality.

Summary

AI-Generated Summary

PDF71May 27, 2025