潛在推理模型的平行測試時間縮放

摘要

並行測試時擴展（TTS）是提升大型語言模型（LLMs）效能的一項關鍵方法，通常通過並行採樣多個基於令牌的思維鏈，並通過投票或搜索來聚合結果。最近在潛在推理方面的進展，即中間推理在連續向量空間中展開，為顯式思維鏈提供了一種更高效的替代方案，然而這類潛在模型是否能同樣受益於並行TTS仍是一個開放性問題，這主要歸因於連續空間中採樣機制的缺失，以及缺乏用於高級軌跡聚合的概率信號。本研究通過解決上述問題，實現了潛在推理模型的並行TTS。在採樣方面，我們引入了兩種受不確定性啟發的隨機策略：蒙特卡羅Dropout和加性高斯噪聲。在聚合方面，我們設計了一個潛在獎勵模型（LatentRM），該模型通過逐步對比目標進行訓練，以評分和指導潛在推理。大量實驗和可視化分析表明，這兩種採樣策略都能有效隨計算資源擴展，並展現出不同的探索動態，而LatentRM則實現了有效的軌跡選擇。總的來說，我們的研究為連續空間中的可擴展推理開闢了新的方向。代碼已發佈於https://github.com/YRYangang/LatentTTS。

English

Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. \ This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code released at https://github.com/YRYangang/LatentTTS.

潛在推理模型的平行測試時間縮放

Parallel Test-Time Scaling for Latent Reasoning Models

摘要

Support