동적 전문가 탐색: 테스트 시점에서의 Mixture-of-Experts LLM 추론 능력 향상

초록

테스트 타임 스케일링(TTS)은 추론 과정에서 추가적인 계산을 할당함으로써 대규모 언어 모델(LLM)의 추론 능력을 향상시킵니다. 그러나 기존 접근 방식은 주로 출력 수준의 샘플링에 의존하며 모델 아키텍처의 역할을 간과하고 있습니다. 주류의 전문가 혼합(MoE) LLM에서 우리는 활성화된 전문가의 수를 변화시키면 안정적인 정확도를 유지하면서 상호 보완적인 솔루션 세트가 생성된다는 것을 관찰했습니다. 이는 새로운 탐구 대상이 되는 다양성의 원천을 드러냅니다. 이러한 관찰에 동기를 받아, 우리는 전문가 활성화를 탐색 공간의 제어 가능한 차원으로 끌어올리는 TTS 전략인 동적 전문가 탐색(DES)을 제안합니다. DES는 두 가지 핵심 구성 요소를 통합합니다: (1) 동적 MoE는 추가 비용 없이 다양한 추론 궤적을 생성하기 위해 추론 중에 전문가 수를 직접 제어할 수 있게 합니다; (2) 전문가 구성 상속은 추론 경로 내에서 일관된 전문가 수를 유지하면서 실행 간에 이를 변화시켜 탐색 전반에 걸쳐 안정성과 다양성을 균형 있게 유지합니다. MoE 아키텍처, 검증기 및 추론 벤치마크(즉, 수학, 코드 및 지식)에 걸친 광범위한 실험은 DES가 추가 비용 없이 정확성과 안정성을 향상시키면서 TTS 기준선을 안정적으로 능가함을 보여줍니다. 이러한 결과는 DES가 실용적이고 확장 가능한 아키텍처 인식 TTS의 형태임을 강조하며, 현대 LLM의 구조적 유연성이 추론을 발전시킬 수 있는 방법을 보여줍니다.

English

Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference. However, existing approaches primarily rely on output-level sampling while overlooking the role of model architecture. In mainstream Mixture-of-Experts (MoE) LLMs, we observe that varying the number of activated experts yields complementary solution sets with stable accuracy, revealing a new and underexplored source of diversity. Motivated by this observation, we propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space. DES integrates two key components: (1) Dynamic MoE, which enables direct control of expert counts during inference to generate diverse reasoning trajectories without additional cost; and (2) Expert Configuration Inheritance, which preserves consistent expert counts within a reasoning path while varying them across runs, thereby balancing stability and diversity throughout the search. Extensive experiments across MoE architectures, verifiers and reasoning benchmarks (i.e., math, code and knowledge) demonstrate that DES reliably outperforms TTS baselines, enhancing accuracy and stability without additional cost. These results highlight DES as a practical and scalable form of architecture-aware TTS, illustrating how structural flexibility in modern LLMs can advance reasoning.

동적 전문가 탐색: 테스트 시점에서의 Mixture-of-Experts LLM 추론 능력 향상

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

초록

Support