专家混合模型解锁深度强化学习的参数缩放

摘要

最近在（自我）监督学习模型方面的快速进展在很大程度上是通过经验性的缩放定律来预测的：模型的性能与其规模成比例地增长。然而，在强化学习领域，类似的缩放定律仍然难以捉摸，因为增加模型的参数数量通常会损害最终的性能。在本文中，我们展示了将专家混合（MoE）模块，特别是软MoE（Puigcerver等人，2023年），纳入基于价值的网络中，可以产生更具参数可扩展性的模型，这在各种训练方案和模型规模下都表现出显著的性能提升。因此，这项工作为发展强化学习的缩放定律提供了强有力的经验证据。

English

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

专家混合模型解锁深度强化学习的参数缩放

Mixtures of Experts Unlock Parameter Scaling for Deep RL

摘要

Support