正規化フローによる潜在推論

要旨

大規模言語モデルは、明示的な連鎖思考（CoT）を生成することで推論を改善することが多く、中間計算の重要性を示している。しかし、テキスト形式のCoTはこの計算を離散的で逐次的、かつ通信指向のトークンストリームに強制する。すなわち、基礎となる更新が意味的、不確実、または部分的にしか形成されていない場合でも、各推論ステップはモデルが先に進む前に言語化されなければならない。潜在推論は、テキストに変換する前にコンパクトな連続状態で中間計算を実行することで、より高帯域幅の代替手段を提供する。しかし、既存の潜在推論手法は、自己回帰型言語モデルにおいてCoTを効果的にする主要な利点（ネイティブな左から右への生成、確率的サンプリング、KVキャッシュデコードとの互換性、扱いやすい尤度推定など）をしばしば犠牲にしている。そこで我々は、正規化フローを用いて連続思考をモデル化することでこれらの利点を保持する潜在推論フレームワークNF-CoTを提案する。NF-CoTは、LLMバックボーン内部にTARFlowスタイルの正規化フローを実装し、明示的なCoTから抽出されたコンパクトな連続思考に対する扱いやすい確率モデルを定義する。連続思考の位置はNFヘッドによって生成され、テキストの位置は同じ因果ストリーム内の標準的なLMヘッドによって生成される。この設計により、潜在思考に対する正確な尤度が提供され、元のKVキャッシュを用いた確率的な左から右へのデコードが可能になり、潜在推論空間における直接的な方策勾配最適化がサポートされる。コード生成ベンチマークにおいて、NF-CoTは明示的CoTおよび従来の潜在推論ベースラインを上回る合格率を達成するとともに、中間推論コストを大幅に削減する。

English

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.