动态专家搜索：提升测试时混合专家大语言模型的推理能力

摘要

测试时扩展（TTS）通过增加推理过程中的计算资源分配，提升了大型语言模型（LLMs）的推理能力。然而，现有方法主要依赖于输出层面的采样，忽视了模型架构的作用。在主流专家混合（MoE）LLMs中，我们观察到，调整激活专家数量能够产生互补的解决方案集，同时保持稳定的准确率，这揭示了一个尚未充分探索的多样性来源。基于这一观察，我们提出了动态专家搜索（DES），一种将专家激活提升为搜索空间可控维度的TTS策略。DES整合了两个关键组件：（1）动态MoE，它能够在推理过程中直接控制专家数量，以无额外成本生成多样化的推理轨迹；（2）专家配置继承，它确保在单次推理路径内专家数量保持一致，而在不同运行间变化，从而在搜索过程中平衡稳定性与多样性。跨MoE架构、验证器及推理基准（如数学、代码和知识）的大量实验表明，DES在无需额外成本的情况下，稳定超越了TTS基线，提升了准确率和稳定性。这些成果凸显了DES作为一种实用且可扩展的架构感知TTS形式，展示了现代LLMs结构灵活性如何推动推理能力的进步。

English

Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference. However, existing approaches primarily rely on output-level sampling while overlooking the role of model architecture. In mainstream Mixture-of-Experts (MoE) LLMs, we observe that varying the number of activated experts yields complementary solution sets with stable accuracy, revealing a new and underexplored source of diversity. Motivated by this observation, we propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space. DES integrates two key components: (1) Dynamic MoE, which enables direct control of expert counts during inference to generate diverse reasoning trajectories without additional cost; and (2) Expert Configuration Inheritance, which preserves consistent expert counts within a reasoning path while varying them across runs, thereby balancing stability and diversity throughout the search. Extensive experiments across MoE architectures, verifiers and reasoning benchmarks (i.e., math, code and knowledge) demonstrate that DES reliably outperforms TTS baselines, enhancing accuracy and stability without additional cost. These results highlight DES as a practical and scalable form of architecture-aware TTS, illustrating how structural flexibility in modern LLMs can advance reasoning.

动态专家搜索：提升测试时混合专家大语言模型的推理能力

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

摘要

Support