潜在的思考連鎖による計画立案：推論と言語化の分離

要旨

Chain-of-Thought（CoT）は大規模言語モデル（LLM）に複雑な問題への取り組みを可能にするが、離散的なトークン空間に基づく場合、計算コストと推論経路の崩壊によって制約が残る。近年の潜在的推論アプローチは、連続的な隠れ状態内で推論を行うことで効率化を図っている。しかし、これらの手法は通常、明示的な推論ステップから潜在状態への不透明な end-to-end マッピングとして動作し、推論時に事前定義された数の潜在ステップを必要とすることが多い。本研究では、潜在的推論を計画として再定式化するフレームワーク **PLaT（Planning with Latent Thoughts）** を提案する。これは、推論と言語化を根本的に分離するものである。我々は推論を潜在的な計画状態の決定論的軌道としてモデル化し、別個のデコーダがこれらの思考を必要に応じてテキストに接地する。この分離により、モデルは固定されたハイパーパラメータに依存するのではなく、推論を終了するタイミングを動的に決定できる。数学的ベンチマークによる実験結果は、明確なトレードオフを明らかにしている：PLaT はベースラインよりも貪欲法による精度は低いものの、推論の多様性の点で優れたスケーラビリティを示す。これは、PLaT がロバストでより広範な解空間を学習しており、推論時検索のための透明性が高くスケーラブルな基盤を提供することを示唆している。

English

Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search.

潜在的思考連鎖による計画立案：推論と言語化の分離

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

要旨

Support