深度天花板：论大型语言模型在潜在规划探索中的局限性

摘要

思维链监控的可行性依赖于模型无法在其潜在表征中进行有效推理。然而，我们对大语言模型中此类潜在推理的局限性知之甚少。通过研究模型能否在无中间步骤监督的情况下发现多步规划策略，并在单次前向传播中隐式执行这些策略，我们测试了这些局限性。利用可精确控制潜在规划步数的图路径寻找任务，我们发现了大规模缩放仍无法解决的显著局限：从头训练的小型变压器最多能发现需要三步潜在推理的策略，经微调的GPT-4o和Qwen3-32B可达五步，而GPT-5.4在少量示例提示下能达到七步。尽管模型在训练时能习得的最大潜在规划深度为五步，但所发现的策略在测试时能泛化至八步潜在推理。这揭示了模型仅通过最终答案监督发现潜在策略的能力，与策略发现后的执行能力之间存在分离。若类似局限性普遍存在，则需要多步协同潜在规划的复杂策略可能需要显式教学或外部化处理，这为思维链监控提供了理论依据。

English

The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is known about the limits of such latent reasoning in LLMs. We test these limits by studying whether models can discover multi-step planning strategies without supervision on intermediate steps and execute them latently, within a single forward pass. Using graph path-finding tasks that precisely control the number of required latent planning steps, we uncover a striking limitation unresolved by massive scaling: tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under few-shot prompting. Although the maximum latent planning depth models can learn during training is five, the discovered strategy generalizes up to eight latent steps at test-time. This reveals a dissociation between the ability to discover a latent strategy under final-answer supervision alone and the ability to execute it once discovered. If similar limits hold more broadly, strategies requiring multiple coordinated latent planning steps may need to be explicitly taught or externalized, lending credence to CoT monitoring.