潜在思维链作为规划:将推理与言语表达解耦
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
January 29, 2026
作者: Jiecong Wang, Hao Peng, Chunyang Liu
cs.AI
摘要
思维链(CoT)技术赋能大语言模型处理复杂问题,但其在离散词符空间中的推理仍受计算成本高和推理路径坍塌的制约。近期潜在推理方法尝试通过连续隐状态进行推理以提升效率,然而这些方法通常作为从显式推理步骤到隐状态的端到端映射运行,且推理时往往需要预定义隐式步骤数量。本研究提出潜在思维规划框架PLaT,通过根本性分离推理与语言化过程,将潜在推理重构为规划问题。我们将推理建模为潜在规划状态的确定性轨迹,而独立解码器在必要时将这些思维具象化为文本。这种解耦使模型能动态决定终止推理的时机,而非依赖固定超参数。数学基准测试的实证结果揭示了一种独特权衡:虽然PLaT的贪婪准确率低于基线模型,但在推理多样性方面展现出卓越的可扩展性。这表明PLaT学习到了更稳健、更广阔的解决方案空间,为推理时搜索提供了透明且可扩展的基础框架。
English
Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search.