ChatPaper.aiChatPaper

橋接推理與學習:利用分佈外複雜性泛化揭開幻象

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

October 6, 2025
作者: Mohammad Mahdi Samiei Paqaleh, Arash Marioriyad, Arman Tahmasebi-Zadeh, Mohamadreza Fereydooni, Mahdi Ghaznavai, Mahdieh Soleymani Baghshah
cs.AI

摘要

近年來,人工智慧的發展已從模式識別任務推進至需要逐步、系統二(System2)式推理的問題,尤其是在大型語言模型領域。然而,與學習不同,在學習中,泛化與分佈外(OoD)評估的概念已得到良好形式化,但對於推理能力尚無明確、一致的定義或衡量標準。我們提出「複雜度分佈外泛化」(Complexity OoD)作為定義和衡量推理能力的框架與問題設定。當模型在測試樣本上保持性能,且這些樣本所需的最小解決複雜度(無論是表徵上的更豐富解決結構,還是計算上的更多推理步驟/程序長度)超過所有訓練樣本時,該模型即展現了複雜度分佈外泛化。我們通過解決方案的柯氏複雜度及操作代理(如對象/關係計數;推理步驟計數)來形式化複雜度,闡明複雜度分佈外與長度及組合分佈外的區別。這一視角統一了學習與推理:許多在低複雜度下可通過系統一(System1)式處理解決的案例,在複雜度壓力下轉變為系統二式處理,而系統二可視為對解決結構的泛化。我們將這一觀點轉化為實踐建議,涵蓋整個技術棧的複雜度分佈外操作化:將複雜度納入基準與評估指標設計,重新思考監督以針對解決軌跡,尋求並設計針對複雜度分佈外泛化的歸納偏置,應對學習推理的溢出效應,如虛假捷徑、語義魯棒性、災難性遺忘及逐步校準。由於僅靠數據擴展無法解決複雜度分佈外問題,實現穩健推理的進展將需要明確建模並根據複雜度分配計算的架構與訓練機制。
English
Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD) evaluation concepts are well formalized, there is no clear, consistent definition or metric for reasoning ability. We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework and problem setting to define and measure reasoning. A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity, either representational (richer solution structure) or computational (more reasoning steps/program length), exceeds that of all training examples. We formalize complexity via solution description Kolmogorov complexity and operational proxies (e.g., object/relation counts; reasoning step counts), clarifying how Complexity OoD differs from length and compositional OoD. This lens unifies learning and reasoning: many cases solvable with System1 like processing at low complexity become System2 like under complexity pressure, while System2 can be viewed as generalization over solution structures. We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack: incorporating complexity into benchmark and evaluation metric design, rethinking supervision to target solution traces, seeking and designing inductive biases for Complexity OoD generalization, addressing learning to reason spillovers such as spurious shortcuts, semantic robustness, catastrophic forgetting, and step wise calibration. Because Complexity OoD cannot be solved by scaling data alone, progress toward robust reasoning will require architectures and training regimes that explicitly model and allocate computation with respect to complexity.
PDF92October 13, 2025