深度天花板：论大型语言模型在潜在规划发现中的局限性

摘要

思维链监控的有效性取决于模型能否在其潜在表征中进行有效推理，但目前对大语言模型潜在推理能力的边界仍知之甚少。我们通过研究模型是否能在无中间步骤监督的情况下自主发现多步规划策略，并在单次前向传播中隐式执行这些策略，来检验这一边界。借助可精确控制所需潜在规划步数的图路径寻找任务，我们发现了一个尚未通过大规模扩展解决的显著局限：从头训练的小型变压器最多能发现需要三步潜在推理的策略，经微调的GPT-4o和Qwen3-32B可达五步，而少样本提示下的GPT-5.4能实现七步。尽管模型在训练阶段可习得的潜在规划深度上限为五步，但已发现的策略在测试时能泛化至八步潜在推理。这表明模型仅通过最终答案监督来发现潜在策略的能力，与策略发现后的执行能力存在分离。若类似局限具有普适性，则需要多步协同潜在规划的复杂策略可能需要显式教学或外部化处理，这为思维链监控的合理性提供了佐证。

English

The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is known about the limits of such latent reasoning in LLMs. We test these limits by studying whether models can discover multi-step planning strategies without supervision on intermediate steps and execute them latently, within a single forward pass. Using graph path-finding tasks that precisely control the number of required latent planning steps, we uncover a striking limitation unresolved by massive scaling: tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under few-shot prompting. Although the maximum latent planning depth models can learn during training is five, the discovered strategy generalizes up to eight latent steps at test-time. This reveals a dissociation between the ability to discover a latent strategy under final-answer supervision alone and the ability to execute it once discovered. If similar limits hold more broadly, strategies requiring multiple coordinated latent planning steps may need to be explicitly taught or externalized, lending credence to CoT monitoring.

深度天花板：论大型语言模型在潜在规划发现中的局限性

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

摘要

Support