SoftCoT++：基於軟性思維鏈推理的測試時擴展

摘要

測試時擴展（Test-Time Scaling, TTS）指的是在推理過程中通過分配額外計算來提升推理性能的方法，而無需改變模型的參數。現有的TTS方法在離散的token空間中操作，通過生成更多的中間步驟來實現，而最近在Coconut和SoftCoT中的研究表明，在連續的潛在空間中進行思考可以進一步提升推理性能。這種潛在思考編碼了信息豐富的思維，避免了與自迴歸token生成相關的信息損失，從而激發了對連續空間推理的更大興趣。與離散解碼不同，在離散解碼中，重複採樣可以探索多樣的推理路徑，而在連續空間中的潛在表示對於給定輸入是固定的，這限制了多樣化的探索，因為所有解碼路徑都源自同一個潛在思考。為克服這一限制，我們引入了SoftCoT++，將SoftCoT擴展到測試時擴展範式，通過實現多樣化的思考路徑探索。具體來說，我們通過多個專用初始token擾動潛在思考，並應用對比學習來促進軟思考表示之間的多樣性。在五個推理基準和兩種不同的LLM架構上的實驗表明，SoftCoT++顯著提升了SoftCoT的性能，並且也優於使用自一致性擴展的SoftCoT。此外，它與自一致性等傳統擴展技術表現出良好的兼容性。源代碼可在https://github.com/xuyige/SoftCoT獲取。

English

Test-Time Scaling (TTS) refers to approaches that improve reasoning performance by allocating extra computation during inference, without altering the model's parameters. While existing TTS methods operate in a discrete token space by generating more intermediate steps, recent studies in Coconut and SoftCoT have demonstrated that thinking in the continuous latent space can further enhance the reasoning performance. Such latent thoughts encode informative thinking without the information loss associated with autoregressive token generation, sparking increased interest in continuous-space reasoning. Unlike discrete decoding, where repeated sampling enables exploring diverse reasoning paths, latent representations in continuous space are fixed for a given input, which limits diverse exploration, as all decoded paths originate from the same latent thought. To overcome this limitation, we introduce SoftCoT++ to extend SoftCoT to the Test-Time Scaling paradigm by enabling diverse exploration of thinking paths. Specifically, we perturb latent thoughts via multiple specialized initial tokens and apply contrastive learning to promote diversity among soft thought representations. Experiments across five reasoning benchmarks and two distinct LLM architectures demonstrate that SoftCoT++ significantly boosts SoftCoT and also outperforms SoftCoT with self-consistency scaling. Moreover, it shows strong compatibility with conventional scaling techniques such as self-consistency. Source code is available at https://github.com/xuyige/SoftCoT.

SoftCoT++：基於軟性思維鏈推理的測試時擴展

SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

摘要

Support