推論から学習への架け橋：分布外の複雑性を用いた一般化による幻想の解明

要旨

近年の進展により、AIのフロンティアはパターン認識タスクから、段階的なSystem2スタイルの推論を必要とする問題へと移行しつつあり、特に大規模言語モデルにおいてその傾向が顕著です。しかし、学習においては汎化や分布外（OoD）評価の概念が十分に形式化されているのに対し、推論能力については明確で一貫した定義や指標が存在しません。本論文では、推論を定義し測定するための枠組みおよび問題設定として、複雑性分布外（Complexity OoD）汎化を提案します。モデルがComplexity OoD汎化を示すのは、テストインスタンスにおいて、その最小限必要な解決複雑性（表現的により豊かな解決構造、または計算的により多くの推論ステップ/プログラム長）がすべての訓練例を上回る場合に性能を維持するときです。我々は、解決記述のコルモゴロフ複雑性および操作的な代理指標（例：オブジェクト/関係の数、推論ステップ数）を通じて複雑性を形式化し、Complexity OoDが長さや合成的OoDとどのように異なるかを明確にします。この視点は学習と推論を統一的に捉えます：低い複雑性ではSystem1のような処理で解決可能な多くのケースが、複雑性の圧力下ではSystem2のようなものになり、一方でSystem2は解決構造に対する汎化と見なすことができます。我々はこの視点を実践に移すため、スタック全体にわたってComplexity OoDを操作化するための提言を行います：ベンチマークおよび評価指標の設計に複雑性を取り入れること、解決トレースをターゲットとした監視の再考、Complexity OoD汎化のための帰納的バイアスの探索と設計、誤ったショートカット、意味的ロバスト性、破滅的忘却、ステップごとのキャリブレーションなどの推論学習の波及効果への対応。Complexity OoDはデータのスケーリングだけでは解決できないため、堅牢な推論への進展には、複雑性を明示的にモデル化し計算を割り当てるアーキテクチャと訓練体制が必要となります。

English

Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD) evaluation concepts are well formalized, there is no clear, consistent definition or metric for reasoning ability. We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework and problem setting to define and measure reasoning. A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity, either representational (richer solution structure) or computational (more reasoning steps/program length), exceeds that of all training examples. We formalize complexity via solution description Kolmogorov complexity and operational proxies (e.g., object/relation counts; reasoning step counts), clarifying how Complexity OoD differs from length and compositional OoD. This lens unifies learning and reasoning: many cases solvable with System1 like processing at low complexity become System2 like under complexity pressure, while System2 can be viewed as generalization over solution structures. We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack: incorporating complexity into benchmark and evaluation metric design, rethinking supervision to target solution traces, seeking and designing inductive biases for Complexity OoD generalization, addressing learning to reason spillovers such as spurious shortcuts, semantic robustness, catastrophic forgetting, and step wise calibration. Because Complexity OoD cannot be solved by scaling data alone, progress toward robust reasoning will require architectures and training regimes that explicitly model and allocate computation with respect to complexity.

推論から学習への架け橋：分布外の複雑性を用いた一般化による幻想の解明

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

要旨

Support