潛在推理模型的平行測試時間縮放
Parallel Test-Time Scaling for Latent Reasoning Models
October 9, 2025
作者: Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, Wenjie Li
cs.AI
摘要
並行測試時擴展(TTS)是提升大型語言模型(LLMs)效能的一項關鍵方法,通常通過並行採樣多個基於令牌的思維鏈,並通過投票或搜索來聚合結果。最近在潛在推理方面的進展,即中間推理在連續向量空間中展開,為顯式思維鏈提供了一種更高效的替代方案,然而這類潛在模型是否能同樣受益於並行TTS仍是一個開放性問題,這主要歸因於連續空間中採樣機制的缺失,以及缺乏用於高級軌跡聚合的概率信號。本研究通過解決上述問題,實現了潛在推理模型的並行TTS。在採樣方面,我們引入了兩種受不確定性啟發的隨機策略:蒙特卡羅Dropout和加性高斯噪聲。在聚合方面,我們設計了一個潛在獎勵模型(LatentRM),該模型通過逐步對比目標進行訓練,以評分和指導潛在推理。大量實驗和可視化分析表明,這兩種採樣策略都能有效隨計算資源擴展,並展現出不同的探索動態,而LatentRM則實現了有效的軌跡選擇。總的來說,我們的研究為連續空間中的可擴展推理開闢了新的方向。代碼已發佈於https://github.com/YRYangang/LatentTTS。
English
Parallel test-time scaling (TTS) is a pivotal approach for enhancing large
language models (LLMs), typically by sampling multiple token-based
chains-of-thought in parallel and aggregating outcomes through voting or
search. Recent advances in latent reasoning, where intermediate reasoning
unfolds in continuous vector spaces, offer a more efficient alternative to
explicit Chain-of-Thought, yet whether such latent models can similarly benefit
from parallel TTS remains open, mainly due to the absence of sampling
mechanisms in continuous space, and the lack of probabilistic signals for
advanced trajectory aggregation. \ This work enables parallel TTS for latent
reasoning models by addressing the above issues. For sampling, we introduce two
uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive
Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM)
trained with step-wise contrastive objective to score and guide latent
reasoning. Extensive experiments and visualization analyses show that both
sampling strategies scale effectively with compute and exhibit distinct
exploration dynamics, while LatentRM enables effective trajectory selection.
Together, our explorations open a new direction for scalable inference in
continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.