基于归一化流的潜在推理
Latent Reasoning with Normalizing Flows
June 4, 2026
作者: Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang, Haoqiang Kang, Lianhui Qin, Yizhe Zhang, Jiatao Gu
cs.AI
摘要
大语言模型通常通过生成显式思维链(CoT)来提升推理能力,这体现了中间计算的重要性。然而,文本形式的CoT强制要求计算过程通过离散、串行且面向通信的令牌流来执行:即使底层更新具有语义性、不确定性或仅部分形成,模型也必须在每个推理步骤被语言化后才能继续推进。潜在推理通过将中间计算压缩至连续的紧凑状态(再将其转化为文本)提供了更高带宽的替代方案。然而,现有潜在推理方法往往牺牲了使CoT在自回归语言模型中有效的关键优势,包括原生从左到右生成、概率采样、键值缓存解码兼容性以及可处理的似然估计。我们提出NF-CoT,一种保留这些优势的潜在推理框架,通过归一化流对连续思维进行建模。NF-CoT在LLM主干内部实例化了一种TARFlow风格的归一化流,为从显式CoT提炼出的紧凑连续思维定义了可处理的概率模型。连续思维位置由NF头生成,而文本位置则由同一因果流中的标准LM头生成。这种设计为潜在思维提供了精确的似然估计,支持使用原始键值缓存进行概率性从左到右解码,并在潜在推理空间中实现直接策略梯度优化。在代码生成基准测试中,NF-CoT相比显式CoT及先前的潜在推理基线方法提升了通过率,同时显著降低了中间推理成本。
English
Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.