SwiReasoning: 잠재적 및 명시적 사고 전환을 통한 파레토 우위 추론 LLMs

초록

최근 연구에 따르면, 자연어의 경계에 의해 제한되는 명시적인 사고의 연쇄 단계를 통한 이산적 추론을 넘어, 대규모 언어 모델(LLM)은 잠재 공간에서도 연속적으로 추론할 수 있으며, 이는 단계당 더 풍부한 정보를 제공함으로써 토큰 효율성을 향상시킨다. 이러한 가능성에도 불구하고, 잠재적 추론은 특히 훈련이 필요 없는 설정에서 두 가지 도전에 직면한다: 1) 순수한 잠재적 추론은 여러 암묵적 경로를 유지함으로써 탐색 분포를 확장하고, 이는 확률 질량을 분산시켜 노이즈를 유발하며, 단일 고신뢰도 솔루션으로의 수렴을 방해하여 정확도를 저하시킨다; 2) 명시적인 텍스트 없이도 과도한 사고가 지속되어 토큰을 낭비하고 효율성을 저하시킨다. 이러한 문제를 해결하기 위해, 우리는 SwiReasoning이라는 훈련이 필요 없는 LLM 추론 프레임워크를 소개한다. 이 프레임워크는 두 가지 주요 혁신을 특징으로 한다: 1) SwiReasoning은 다음 토큰 분포의 엔트로피 추세로부터 추정된 블록별 신뢰도를 기반으로 명시적 추론과 잠재적 추론 사이를 동적으로 전환하여 탐색과 활용의 균형을 맞추고 적시에 수렴을 촉진한다. 2) SwiReasoning은 사고 블록 전환의 최대 횟수를 제한함으로써 과도한 사고를 억제하고 다양한 문제 난이도에 걸쳐 토큰 효율성을 향상시킨다. 널리 사용되는 수학 및 STEM 벤치마크에서, SwiReasoning은 다양한 모델 패밀리와 규모의 추론 LLM에 걸쳐 평균 정확도를 1.5%~2.8% 일관적으로 향상시켰다. 또한, 제한된 예산 하에서 SwiReasoning은 평균 토큰 효율성을 56%~79% 향상시켰으며, 예산이 더욱 제한될수록 더 큰 이득을 얻었다.

English

Recent work shows that, beyond discrete reasoning through explicit chain-of-thought steps, which are limited by the boundaries of natural languages, large language models (LLMs) can also reason continuously in latent space, allowing richer information per step and thereby improving token efficiency. Despite this promise, latent reasoning still faces two challenges, especially in training-free settings: 1) purely latent reasoning broadens the search distribution by maintaining multiple implicit paths, which diffuses probability mass, introduces noise, and impedes convergence to a single high-confidence solution, thereby hurting accuracy; and 2) overthinking persists even without explicit text, wasting tokens and degrading efficiency. To address these issues, we introduce SwiReasoning, a training-free framework for LLM reasoning which features two key innovations: 1) SwiReasoning dynamically switches between explicit and latent reasoning, guided by block-wise confidence estimated from entropy trends in next-token distributions, to balance exploration and exploitation and promote timely convergence. 2) By limiting the maximum number of thinking-block switches, SwiReasoning curbs overthinking and improves token efficiency across varying problem difficulties. On widely used mathematics and STEM benchmarks, SwiReasoning consistently improves average accuracy by 1.5%-2.8% across reasoning LLMs of different model families and scales. Furthermore, under constrained budgets, SwiReasoning improves average token efficiency by 56%-79%, with larger gains as budgets tighten.

SwiReasoning: 잠재적 및 명시적 사고 전환을 통한 파레토 우위 추론 LLMs

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

초록

Support