ChatPaper.aiChatPaper

智由路由:通过潜在空间与离散空间间的思维路由实现高效推理

ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces

February 12, 2026
作者: Xin Xu, Tong Yu, Xiang Chen, Haoliang Wang, Julian McAuley, Saayan Mitra
cs.AI

摘要

近期研究通过用潜在空间中的连续表征替代显式推理轨迹来探索潜在推理,以提升推理效率,但其效果因场景而异。对潜在推理下模型置信度动态的分析表明,以错误答案结尾的思维轨迹比正确答案结尾的轨迹包含更少的低置信度步骤。同时我们发现,由多个低置信度备选思维聚合而成的软嵌入可能引入并传播噪声,导致对不可靠推理轨迹的过度自信。基于这些观察,我们提出ThinkRouter——一种推理时感知置信度的路由机制,通过规避高置信度与噪声来实现高效推理。该机制在模型置信度较低时将思维路由至离散词元空间,反之则路由至潜在空间。在多个大型推理模型上进行的STEM推理与代码生成基准测试表明,ThinkRouter在准确率上显著优于显式思维链、随机路由及潜在推理基线,Pass@1指标平均提升19.70分,同时生成长度最高缩短15.55%。进一步综合分析显示,ThinkRouter能校准显式思维链与潜在推理产生的误差,并通过全局降低模型置信度加速思维终止符的生成。
English
Recent work explores latent reasoning to improve reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space, yet its effectiveness varies across settings. Analysis of model confidence dynamics under latent reasoning reveals that thinking trajectories ending in incorrect answers contain fewer low-confidence steps than those ending in correct answers. Meanwhile, we suggest that soft embeddings aggregated by multiple low-confidence thinking alternatives may introduce and propagate noise, leading to high confidence in unreliable reasoning trajectories. Motivated by these observations, ThinkRouter, an inference-time confidence-aware routing mechanism is proposed to avoid high confidence and noise for efficient reasoning. ThinkRouter routes thinking to the discrete token space when model confidence is low, and to the latent space otherwise. Extensive experiments on STEM reasoning and coding benchmarks across diverse large reasoning models demonstrate that ThinkRouter outperforms explicit CoT, random routing, and latent reasoning baselines in terms of accuracy, achieving an average improvement of 19.70 points in Pass@1, while reducing generation length by up to 15.55%. Further comprehensive analysis reveals that ThinkRouter can calibrate errors arising from explicit CoT and latent reasoning, and accelerates end-of-thinking token generation by globally lowering model confidence.
PDF51February 14, 2026