SoftCoT++：基于软性思维链推理的测试时扩展

摘要

测试时扩展（Test-Time Scaling, TTS）指的是在推理阶段通过分配额外计算资源来提升推理性能的方法，而无需改变模型参数。现有的TTS方法通常在离散的token空间中操作，通过生成更多的中间步骤来提升性能。然而，近期在Coconut和SoftCoT中的研究表明，在连续的潜在空间中进行思考能够进一步强化推理表现。这类潜在思维编码了信息丰富的思考过程，避免了自回归token生成带来的信息损失，从而激发了人们对连续空间推理的浓厚兴趣。与离散解码不同，后者通过重复采样探索多样化的推理路径，而连续空间中的潜在表示对于给定输入是固定的，这限制了多样化的探索，因为所有解码路径都源自同一潜在思维。为了克服这一局限，我们提出了SoftCoT++，将SoftCoT扩展至测试时扩展范式，实现了对思维路径的多样化探索。具体而言，我们通过多个专用初始token扰动潜在思维，并应用对比学习以促进软思维表示之间的多样性。在五个推理基准和两种不同的大语言模型架构上的实验表明，SoftCoT++显著提升了SoftCoT的性能，并且也超越了采用自一致性扩展的SoftCoT。此外，它展现了与传统扩展技术如自一致性的良好兼容性。源代码可在https://github.com/xuyige/SoftCoT获取。

English

Test-Time Scaling (TTS) refers to approaches that improve reasoning performance by allocating extra computation during inference, without altering the model's parameters. While existing TTS methods operate in a discrete token space by generating more intermediate steps, recent studies in Coconut and SoftCoT have demonstrated that thinking in the continuous latent space can further enhance the reasoning performance. Such latent thoughts encode informative thinking without the information loss associated with autoregressive token generation, sparking increased interest in continuous-space reasoning. Unlike discrete decoding, where repeated sampling enables exploring diverse reasoning paths, latent representations in continuous space are fixed for a given input, which limits diverse exploration, as all decoded paths originate from the same latent thought. To overcome this limitation, we introduce SoftCoT++ to extend SoftCoT to the Test-Time Scaling paradigm by enabling diverse exploration of thinking paths. Specifically, we perturb latent thoughts via multiple specialized initial tokens and apply contrastive learning to promote diversity among soft thought representations. Experiments across five reasoning benchmarks and two distinct LLM architectures demonstrate that SoftCoT++ significantly boosts SoftCoT and also outperforms SoftCoT with self-consistency scaling. Moreover, it shows strong compatibility with conventional scaling techniques such as self-consistency. Source code is available at https://github.com/xuyige/SoftCoT.

SoftCoT++：基于软性思维链推理的测试时扩展

SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

摘要

Support