전문가 혼합 모델이 딥 강화 학습을 위한 파라미터 스케일링의 문을 열다

초록

최근 (자기) 지도 학습 모델의 급속한 발전은 대부분 경험적 스케일링 법칙에 의해 예측됩니다: 모델의 성능은 크기에 비례하여 증가합니다. 그러나 강화 학습 영역에서는 이와 유사한 스케일링 법칙이 여전히 찾기 어려운데, 이는 모델의 매개변수 수를 증가시키면 오히려 최종 성능이 저하되는 경우가 많기 때문입니다. 본 논문에서는 Mixture-of-Expert(MoE) 모듈, 특히 Soft MoEs(Puigcerver et al., 2023)를 가치 기반 네트워크에 통합하면 매개변수 확장성이 더 높은 모델이 생성됨을 보여줍니다. 이는 다양한 훈련 체계와 모델 크기에서 상당한 성능 향상으로 입증됩니다. 따라서 이 연구는 강화 학습을 위한 스케일링 법칙 개발에 강력한 경험적 증거를 제공합니다.

English

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

전문가 혼합 모델이 딥 강화 학습을 위한 파라미터 스케일링의 문을 열다

Mixtures of Experts Unlock Parameter Scaling for Deep RL

초록

Support