专家选择路由机制实现扩散语言模型的自适应计算
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
April 2, 2026
作者: Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu
cs.AI
摘要
擴散語言模型(DLMs)實現了並行非自回歸文本生成,然而現有的DLM專家混合模型沿用了自回歸系統的令牌選擇路由機制,導致負載不均與計算分配僵化。我們證明專家選擇路由更適合DLMs:該機制通過設計實現確定性負載均衡,相比TC路由能獲得更高吞吐量和更快收斂速度。基於EC路由的專家容量可外部控制的特性,我們提出時步依賴型專家容量分配法,根據去噪步驟動態調整專家資源。研究發現:在保持浮點運算量不變時,對低掩碼率步驟分配更多容量能持續獲得最佳性能,並從機理上解釋——低掩碼率上下文中的令牌學習效率呈現數量級提升,因此將計算資源集中於這些步驟可產生最大邊際收益。最後我們證明,僅需替換路由模塊即可將預訓練的TC型DLM改造成EC架構,在多樣化下游任務中實現加速收斂與精度提升。這些成果共同確立了EC路由作為DLM MoE模型的更優範式,並揭示DLM的計算可視為自適應策略而非固定結構常數。代碼已開源於https://github.com/zhangshuibai/EC-DLM。
English
Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we introduce timestep-dependent expert capacity, which varies expert allocation according to the denoising step. We find that allocating more capacity to low-mask-ratio steps consistently achieves the best performance under matched FLOPs, and provide a mechanistic explanation: tokens in low-mask-ratio contexts exhibit an order-of-magnitude higher learning efficiency, so concentrating compute on these steps yields the largest marginal return. Finally, we show that existing pretrained TC DLMs can be retrofitted to EC by replacing only the router, achieving faster convergence and improved accuracy across diverse downstream tasks. Together, these results establish EC routing as a superior paradigm for DLM MoE models and demonstrate that computation in DLMs can be treated as an adaptive policy rather than a fixed architectural constant. Code is available at https://github.com/zhangshuibai/EC-DLM.