SimBa:简洁偏好用于扩展深度强化学习中的参数
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
October 13, 2024
作者: Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno
cs.AI
摘要
最近在计算机视觉(CV)和自然语言处理(NLP)领域的进展主要是通过增加网络参数的数量来推动的,尽管传统理论表明更大的网络容易出现过拟合现象。这些大型网络通过整合引入简单性偏差的组件来避免过拟合,引导模型朝向简单且可泛化的解决方案。然而,在深度强化学习(RL)领域,设计和扩展网络的研究相对较少。受到这一机遇的启发,我们提出了SimBa,一种旨在通过引入简单性偏差来扩展深度RL参数的架构。SimBa由三个组件组成:(i)一个观察规范化层,使用运行统计数据标准化输入,(ii)一个残差前馈块,提供从输入到输出的线性路径,以及(iii)一个层规范化层,用于控制特征的大小。通过SimBa扩展参数,各种深度RL算法的样本效率得到了持续改善,包括离策略、在策略和无监督方法。此外,仅通过将SimBa架构集成到SAC中,就能够在DMC、MyoSuite和HumanoidBench等环境中以高计算效率匹敌或超越最先进的深度RL方法。这些结果展示了SimBa在不同RL算法和环境中的广泛适用性和有效性。
English
Recent advances in CV and NLP have been largely driven by scaling up the
number of network parameters, despite traditional theories suggesting that
larger networks are prone to overfitting. These large networks avoid
overfitting by integrating components that induce a simplicity bias, guiding
models toward simple and generalizable solutions. However, in deep RL,
designing and scaling up networks have been less explored. Motivated by this
opportunity, we present SimBa, an architecture designed to scale up parameters
in deep RL by injecting a simplicity bias. SimBa consists of three components:
(i) an observation normalization layer that standardizes inputs with running
statistics, (ii) a residual feedforward block to provide a linear pathway from
the input to output, and (iii) a layer normalization to control feature
magnitudes. By scaling up parameters with SimBa, the sample efficiency of
various deep RL algorithms-including off-policy, on-policy, and unsupervised
methods-is consistently improved. Moreover, solely by integrating SimBa
architecture into SAC, it matches or surpasses state-of-the-art deep RL methods
with high computational efficiency across DMC, MyoSuite, and HumanoidBench.
These results demonstrate SimBa's broad applicability and effectiveness across
diverse RL algorithms and environments.Summary
AI-Generated Summary