SoftCoT++: 소프트 사고 연쇄 추론을 통한 테스트 시간 스케일링

초록

테스트 타임 스케일링(Test-Time Scaling, TTS)은 모델의 매개변수를 변경하지 않고 추론 과정에서 추가적인 계산을 할당함으로써 추론 성능을 향상시키는 접근법을 의미한다. 기존의 TTS 방법들은 더 많은 중간 단계를 생성함으로써 이산 토큰 공간에서 동작하지만, 최근 Coconut과 SoftCoT 연구에서는 연속 잠재 공간에서 사고하는 것이 추론 성능을 더욱 향상시킬 수 있음을 보여주었다. 이러한 잠재적 사고는 자기회귀적 토큰 생성과 관련된 정보 손실 없이 정보를 포함한 사고를 인코딩하며, 이는 연속 공간 추론에 대한 관심을 증가시키고 있다. 반복 샘플링이 다양한 추론 경로를 탐색할 수 있도록 하는 이산 디코딩과 달리, 연속 공간의 잠재 표현은 주어진 입력에 대해 고정되어 있으며, 이는 모든 디코딩된 경로가 동일한 잠재적 사고에서 비롯되기 때문에 다양한 탐색을 제한한다. 이러한 한계를 극복하기 위해, 우리는 SoftCoT++를 도입하여 SoftCoT를 테스트 타임 스케일링 패러다임으로 확장하고 다양한 사고 경로 탐색을 가능하게 한다. 구체적으로, 우리는 여러 특수화된 초기 토큰을 통해 잠재적 사고를 교란하고, 대조 학습을 적용하여 소프트 사고 표현 간의 다양성을 촉진한다. 다섯 가지 추론 벤치마크와 두 가지 다른 LLM 아키텍처에 대한 실험을 통해 SoftCoT++가 SoftCoT를 크게 향상시키고, 자기 일관성 스케일링을 적용한 SoftCoT보다도 우수한 성능을 보임을 입증하였다. 또한, 이는 자기 일관성과 같은 기존의 스케일링 기법과 강력한 호환성을 보인다. 소스 코드는 https://github.com/xuyige/SoftCoT에서 확인할 수 있다.

English

Test-Time Scaling (TTS) refers to approaches that improve reasoning performance by allocating extra computation during inference, without altering the model's parameters. While existing TTS methods operate in a discrete token space by generating more intermediate steps, recent studies in Coconut and SoftCoT have demonstrated that thinking in the continuous latent space can further enhance the reasoning performance. Such latent thoughts encode informative thinking without the information loss associated with autoregressive token generation, sparking increased interest in continuous-space reasoning. Unlike discrete decoding, where repeated sampling enables exploring diverse reasoning paths, latent representations in continuous space are fixed for a given input, which limits diverse exploration, as all decoded paths originate from the same latent thought. To overcome this limitation, we introduce SoftCoT++ to extend SoftCoT to the Test-Time Scaling paradigm by enabling diverse exploration of thinking paths. Specifically, we perturb latent thoughts via multiple specialized initial tokens and apply contrastive learning to promote diversity among soft thought representations. Experiments across five reasoning benchmarks and two distinct LLM architectures demonstrate that SoftCoT++ significantly boosts SoftCoT and also outperforms SoftCoT with self-consistency scaling. Moreover, it shows strong compatibility with conventional scaling techniques such as self-consistency. Source code is available at https://github.com/xuyige/SoftCoT.

SoftCoT++: 소프트 사고 연쇄 추론을 통한 테스트 시간 스케일링

SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

초록

Support