深層の天井：潜在的計画発見における大規模言語モデルの限界

要旨

思考連鎖（CoT）監視の実現可能性は、モデルが潜在表現内で効果的に推論できないことに依存している。しかし、大規模言語モデル（LLM）におけるこのような潜在推論の限界についてはほとんど知られていない。我々は、モデルが中間ステップの教師なしで多段階計画戦略を発見し、単一のフォワードパス内で潜在的に実行できるかどうかを検証することで、これらの限界を探る。必要な潜在計画ステップ数を精密に制御したグラフ経路探索課題を用いて、大規模スケーリングでも解消されない顕著な限界を明らかにした：スクラッチから学習した小型トランスフォーマーは最大3段階の潜在ステップを要する戦略を発見し、ファインチューニングされたGPT-4oとQwen3-32Bは5段階、GPT-5.4は数ショットプロンプト下で7段階に達した。学習時にモデルが獲得可能な最大潜在計画深度は5段階であるが、発見された戦略はテスト時に8段階までの潜在ステップへ一般化した。これは、最終回答のみの監督下で潜在戦略を発見する能力と、一度発見された戦略を実行する能力との間に乖離があることを示す。同様の限界がより広範に適用されるなら、複数の連携した潜在計画ステップを要する戦略は明示的に教示されるか、外部化される必要がある可能性があり、CoT監視の信憑性を支持するものである。

English

The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is known about the limits of such latent reasoning in LLMs. We test these limits by studying whether models can discover multi-step planning strategies without supervision on intermediate steps and execute them latently, within a single forward pass. Using graph path-finding tasks that precisely control the number of required latent planning steps, we uncover a striking limitation unresolved by massive scaling: tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under few-shot prompting. Although the maximum latent planning depth models can learn during training is five, the discovered strategy generalizes up to eight latent steps at test-time. This reveals a dissociation between the ability to discover a latent strategy under final-answer supervision alone and the ability to execute it once discovered. If similar limits hold more broadly, strategies requiring multiple coordinated latent planning steps may need to be explicitly taught or externalized, lending credence to CoT monitoring.

深層の天井：潜在的計画発見における大規模言語モデルの限界

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

要旨

Support