并行测试时缩放用于潜在推理模型
Parallel Test-Time Scaling for Latent Reasoning Models
October 9, 2025
作者: Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, Wenjie Li
cs.AI
摘要
并行测试时扩展(TTS)是提升大规模语言模型(LLMs)效能的关键策略,通常通过并行采样多条基于token的思维链,并借助投票或搜索机制汇总结果来实现。近期在潜在推理领域取得的进展,即中间推理过程在连续向量空间中展开,为显式思维链提供了一种更为高效的替代方案。然而,此类潜在模型能否同样受益于并行TTS仍是一个未解之谜,主要原因在于连续空间中采样机制的缺失,以及缺乏用于高级轨迹聚合的概率信号。本研究通过解决上述问题,实现了潜在推理模型的并行TTS。在采样方面,我们引入了两种受不确定性启发的随机策略:蒙特卡洛Dropout与加性高斯噪声。在聚合方面,我们设计了一个潜在奖励模型(LatentRM),该模型通过逐步对比目标进行训练,以评分并指导潜在推理。大量实验与可视化分析表明,两种采样策略均能有效随计算资源扩展,并展现出不同的探索动态,而LatentRM则实现了有效的轨迹选择。综合而言,我们的探索为连续空间中的可扩展推理开辟了新方向。代码已发布于https://github.com/YRYangang/LatentTTS。
English
Parallel test-time scaling (TTS) is a pivotal approach for enhancing large
language models (LLMs), typically by sampling multiple token-based
chains-of-thought in parallel and aggregating outcomes through voting or
search. Recent advances in latent reasoning, where intermediate reasoning
unfolds in continuous vector spaces, offer a more efficient alternative to
explicit Chain-of-Thought, yet whether such latent models can similarly benefit
from parallel TTS remains open, mainly due to the absence of sampling
mechanisms in continuous space, and the lack of probabilistic signals for
advanced trajectory aggregation. \ This work enables parallel TTS for latent
reasoning models by addressing the above issues. For sampling, we introduce two
uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive
Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM)
trained with step-wise contrastive objective to score and guide latent
reasoning. Extensive experiments and visualization analyses show that both
sampling strategies scale effectively with compute and exhibit distinct
exploration dynamics, while LatentRM enables effective trajectory selection.
Together, our explorations open a new direction for scalable inference in
continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.