專家混合模型解鎖深度強化學習的參數縮放

摘要

最近在（自我）監督學習模型方面的快速進展很大程度上是由實證的規模定律所預測的：模型的性能與其大小成比例地增長。然而，在強化學習領域中，類似的規模定律仍然難以捉摸，增加模型的參數數量通常會損害最終性能。在本文中，我們展示了將專家混合（MoE）模塊，特別是軟MoE（Puigcerver等人，2023年），納入基於價值的網絡中，將導致更具參數可擴展性的模型，這在各種訓練方案和模型大小下都表現為顯著的性能提升。因此，這項工作為發展強化學習的規模定律提供了強有力的實證證據。

English

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

專家混合模型解鎖深度強化學習的參數縮放

Mixtures of Experts Unlock Parameter Scaling for Deep RL

摘要

Support