正規化流潛在推理

摘要

大型語言模型常透過生成明確的思維鏈（CoT）來提升推理能力，凸顯了中間計算的重要性。然而，文字形式的思維鏈迫使這種計算必須透過離散、序列化且以溝通為導向的符號序列進行：每一步推理都必須先以語言表述，模型才能繼續下一步，即使底層的更新是語義性、不確定或僅部分成形的。潛在推理提供了一種高頻寬的替代方案，在承諾轉換為文字之前，先以緊湊的連續狀態執行中間計算。然而，現有的潛在推理方法往往犧牲了自回歸語言模型中使思維鏈有效的關鍵優勢，包括原生從左到右的生成、機率取樣、與KV快取解碼的相容性，以及可處理的似然估計。我們提出NF-CoT，一個保留這些優勢的潛在推理框架，透過正規化流對連續思維進行建模。NF-CoT在大型語言模型主幹內部實例化一種TARFlow風格的正規化流，定義了一個關於從明確思維鏈提煉而來的緊湊連續思維的可處理機率模型。連續思維的位置由NF頭生成，而文字位置則由同一因果流中的標準語言模型頭生成。此設計為潛在思維提供了精確的似然，實現了使用原始KV快取的機率性從左到右解碼，並支援在潛在推理空間中直接進行策略梯度優化。在程式碼生成基準測試中，NF-CoT在提升通過率的同時，相較於明確思維鏈及先前的潛在推理基準，顯著降低了中間推理成本。

English

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.