思維路由器:通過潛在空間與離散空間間的思維路由實現高效推理
ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces
February 12, 2026
作者: Xin Xu, Tong Yu, Xiang Chen, Haoliang Wang, Julian McAuley, Saayan Mitra
cs.AI
摘要
近期研究探索通过用潜在空间中的连续表征替代显式推理轨迹,来提升推理效率的潜在推理方法,但其效果因具体情境而异。对潜在推理下模型置信度动态的分析表明,以错误答案告终的思维轨迹比正确答案轨迹包含更少的低置信度步骤。同时我们认为,由多个低置信度思维备选方案聚合而成的软嵌入可能引入并传播噪声,导致对不可靠推理轨迹的过度自信。基于这些发现,我们提出ThinkRouter——一种推理时置信度感知路由机制,通过规避高置信度状态和噪声来实现高效推理。该机制在模型置信度较低时将思维路由至离散标记空间,反之则路由至潜在空间。在STEM推理和编程基准测试上的大量实验表明,ThinkRouter在准确率上优于显式思维链、随机路由和潜在推理基线,Pass@1指标平均提升19.70分,同时生成长度最多减少15.55%。进一步综合分析揭示,ThinkRouter能校准显式思维链和潜在推理产生的误差,并通过全局降低模型置信度来加速思维终止标记的生成。
English
Recent work explores latent reasoning to improve reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space, yet its effectiveness varies across settings. Analysis of model confidence dynamics under latent reasoning reveals that thinking trajectories ending in incorrect answers contain fewer low-confidence steps than those ending in correct answers. Meanwhile, we suggest that soft embeddings aggregated by multiple low-confidence thinking alternatives may introduce and propagate noise, leading to high confidence in unreliable reasoning trajectories. Motivated by these observations, ThinkRouter, an inference-time confidence-aware routing mechanism is proposed to avoid high confidence and noise for efficient reasoning. ThinkRouter routes thinking to the discrete token space when model confidence is low, and to the latent space otherwise. Extensive experiments on STEM reasoning and coding benchmarks across diverse large reasoning models demonstrate that ThinkRouter outperforms explicit CoT, random routing, and latent reasoning baselines in terms of accuracy, achieving an average improvement of 19.70 points in Pass@1, while reducing generation length by up to 15.55%. Further comprehensive analysis reveals that ThinkRouter can calibrate errors arising from explicit CoT and latent reasoning, and accelerates end-of-thinking token generation by globally lowering model confidence.