并行测试时缩放用于潜在推理模型

摘要

并行测试时扩展（TTS）是提升大规模语言模型（LLMs）效能的关键策略，通常通过并行采样多条基于token的思维链，并借助投票或搜索机制汇总结果来实现。近期在潜在推理领域取得的进展，即中间推理过程在连续向量空间中展开，为显式思维链提供了一种更为高效的替代方案。然而，此类潜在模型能否同样受益于并行TTS仍是一个未解之谜，主要原因在于连续空间中采样机制的缺失，以及缺乏用于高级轨迹聚合的概率信号。本研究通过解决上述问题，实现了潜在推理模型的并行TTS。在采样方面，我们引入了两种受不确定性启发的随机策略：蒙特卡洛Dropout与加性高斯噪声。在聚合方面，我们设计了一个潜在奖励模型（LatentRM），该模型通过逐步对比目标进行训练，以评分并指导潜在推理。大量实验与可视化分析表明，两种采样策略均能有效随计算资源扩展，并展现出不同的探索动态，而LatentRM则实现了有效的轨迹选择。综合而言，我们的探索为连续空间中的可扩展推理开辟了新方向。代码已发布于https://github.com/YRYangang/LatentTTS。

English

Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. \ This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.

并行测试时缩放用于潜在推理模型

Parallel Test-Time Scaling for Latent Reasoning Models

摘要

Support