動態專家搜尋：提升測試時混合專家大型語言模型的推理能力

摘要

測試時擴展（TTS）通過在推理過程中分配額外的計算資源來增強大型語言模型（LLMs）的推理能力。然而，現有方法主要依賴於輸出層面的採樣，而忽視了模型架構的作用。在主流的專家混合（MoE）LLMs中，我們觀察到，改變激活專家的數量可以產生具有穩定準確性的互補解集，這揭示了一個新的且未被充分探索的多樣性來源。基於這一觀察，我們提出了動態專家搜索（DES），這是一種TTS策略，將專家激活提升為搜索空間中的可控維度。DES整合了兩個關鍵組件：（1）動態MoE，它能在推理過程中直接控制專家數量，以無額外成本生成多樣的推理軌跡；（2）專家配置繼承，它在推理路徑內保持一致的專家數量，同時在不同運行間變化，從而平衡搜索過程中的穩定性與多樣性。在MoE架構、驗證器和推理基準（即數學、代碼和知識）上的廣泛實驗表明，DES在無額外成本的情況下，可靠地超越了TTS基線，提升了準確性和穩定性。這些結果凸顯了DES作為一種實用且可擴展的架構感知TTS形式，展示了現代LLMs中結構靈活性如何推動推理能力的進步。

English

Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference. However, existing approaches primarily rely on output-level sampling while overlooking the role of model architecture. In mainstream Mixture-of-Experts (MoE) LLMs, we observe that varying the number of activated experts yields complementary solution sets with stable accuracy, revealing a new and underexplored source of diversity. Motivated by this observation, we propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space. DES integrates two key components: (1) Dynamic MoE, which enables direct control of expert counts during inference to generate diverse reasoning trajectories without additional cost; and (2) Expert Configuration Inheritance, which preserves consistent expert counts within a reasoning path while varying them across runs, thereby balancing stability and diversity throughout the search. Extensive experiments across MoE architectures, verifiers and reasoning benchmarks (i.e., math, code and knowledge) demonstrate that DES reliably outperforms TTS baselines, enhancing accuracy and stability without additional cost. These results highlight DES as a practical and scalable form of architecture-aware TTS, illustrating how structural flexibility in modern LLMs can advance reasoning.

動態專家搜尋：提升測試時混合專家大型語言模型的推理能力

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

摘要

Support