ChatPaper.aiChatPaper

跨越推理与学习:通过分布外复杂性泛化揭示幻觉

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

October 6, 2025
作者: Mohammad Mahdi Samiei Paqaleh, Arash Marioriyad, Arman Tahmasebi-Zadeh, Mohamadreza Fereydooni, Mahdi Ghaznavai, Mahdieh Soleymani Baghshah
cs.AI

摘要

近期进展已将人工智能的前沿从模式识别任务推向需要逐步、系统二(System2)式推理的问题,尤其是在大型语言模型领域。然而,与学习不同,在推理能力方面,尽管泛化与分布外(OoD)评估的概念已得到良好形式化,却缺乏明确且一致的定义或衡量标准。我们提出“复杂度分布外泛化”(Complexity OoD)作为定义和衡量推理能力的框架与问题设定。当模型在测试实例上保持性能,而这些实例所需的最小解决复杂度——无论是表征上的(更丰富的解决方案结构)还是计算上的(更多推理步骤/程序长度)——均超过所有训练样本时,该模型即展现出复杂度分布外泛化。我们通过解决方案描述的柯尔莫哥洛夫复杂度及操作代理(如对象/关系计数;推理步骤计数)来形式化复杂度,阐明复杂度分布外与长度及组合分布外的区别。这一视角统一了学习与推理:许多在低复杂度下可通过系统一(System1)式处理解决的问题,在复杂度压力下转变为系统二式处理,而系统二可视为对解决方案结构的泛化。我们将这一观点转化为实践建议,贯穿整个技术栈实施复杂度分布外:将复杂度融入基准与评估指标设计,重新思考监督以针对解决方案轨迹,寻找并设计促进复杂度分布外泛化的归纳偏置,应对学习推理的溢出效应,如虚假捷径、语义鲁棒性、灾难性遗忘及逐步校准。由于仅靠数据扩展无法解决复杂度分布外问题,实现稳健推理的进步将需要明确建模并依据复杂度分配计算的架构与训练机制。
English
Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD) evaluation concepts are well formalized, there is no clear, consistent definition or metric for reasoning ability. We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework and problem setting to define and measure reasoning. A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity, either representational (richer solution structure) or computational (more reasoning steps/program length), exceeds that of all training examples. We formalize complexity via solution description Kolmogorov complexity and operational proxies (e.g., object/relation counts; reasoning step counts), clarifying how Complexity OoD differs from length and compositional OoD. This lens unifies learning and reasoning: many cases solvable with System1 like processing at low complexity become System2 like under complexity pressure, while System2 can be viewed as generalization over solution structures. We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack: incorporating complexity into benchmark and evaluation metric design, rethinking supervision to target solution traces, seeking and designing inductive biases for Complexity OoD generalization, addressing learning to reason spillovers such as spurious shortcuts, semantic robustness, catastrophic forgetting, and step wise calibration. Because Complexity OoD cannot be solved by scaling data alone, progress toward robust reasoning will require architectures and training regimes that explicitly model and allocate computation with respect to complexity.
PDF92October 13, 2025