動的エキスパート検索：テスト時のMixture-of-Experts LLMにおける推論の強化

要旨

Test-Time Scaling (TTS)は、推論時に追加の計算リソースを割り当てることで、大規模言語モデル（LLMs）の推論能力を向上させます。しかし、既存のアプローチは主に出力レベルのサンプリングに依存しており、モデルアーキテクチャの役割を見落としています。主流のMixture-of-Experts（MoE）LLMsにおいて、活性化するエキスパートの数を変化させることで、安定した精度を保ちつつ補完的な解のセットが得られることを観察しました。これは、新たで未開拓の多様性の源を明らかにしています。この観察に基づき、我々はDynamic Experts Search（DES）を提案します。DESは、エキスパートの活性化を探索空間の制御可能な次元として高めるTTS戦略です。DESは2つの主要なコンポーネントを統合しています：（1）Dynamic MoEは、推論時にエキスパートの数を直接制御し、追加コストなしで多様な推論軌跡を生成します；（2）Expert Configuration Inheritanceは、推論パス内でエキスパートの数を一貫して保ちつつ、実行ごとにそれを変化させることで、探索全体で安定性と多様性のバランスを取ります。MoEアーキテクチャ、検証器、および推論ベンチマーク（数学、コード、知識）にわたる広範な実験により、DESがTTSのベースラインを確実に上回り、追加コストなしで精度と安定性を向上させることが実証されました。これらの結果は、DESが実用的でスケーラブルなアーキテクチャを意識したTTSの形態であることを強調し、現代のLLMsにおける構造的柔軟性が推論を進化させる方法を示しています。

English

Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference. However, existing approaches primarily rely on output-level sampling while overlooking the role of model architecture. In mainstream Mixture-of-Experts (MoE) LLMs, we observe that varying the number of activated experts yields complementary solution sets with stable accuracy, revealing a new and underexplored source of diversity. Motivated by this observation, we propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space. DES integrates two key components: (1) Dynamic MoE, which enables direct control of expert counts during inference to generate diverse reasoning trajectories without additional cost; and (2) Expert Configuration Inheritance, which preserves consistent expert counts within a reasoning path while varying them across runs, thereby balancing stability and diversity throughout the search. Extensive experiments across MoE architectures, verifiers and reasoning benchmarks (i.e., math, code and knowledge) demonstrate that DES reliably outperforms TTS baselines, enhancing accuracy and stability without additional cost. These results highlight DES as a practical and scalable form of architecture-aware TTS, illustrating how structural flexibility in modern LLMs can advance reasoning.

動的エキスパート検索：テスト時のMixture-of-Experts LLMにおける推論の強化

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

要旨

Support