梯度提升强化学习
Gradient Boosting Reinforcement Learning
July 11, 2024
作者: Benjamin Fuhrer, Chen Tessler, Gal Dalal
cs.AI
摘要
神经网络(NN)在各种任务中取得了显著的成果,但缺乏关键特征:可解释性、对分类特征的支持以及适用于边缘设备的轻量级实现。尽管正在进行的努力旨在解决这些挑战,但梯度提升树(GBT)本质上满足了这些要求。因此,GBT已成为许多现实世界应用和竞赛中监督学习任务的首选方法。然而,它们在在线学习场景中的应用,特别是在强化学习(RL)中,受到了限制。在这项工作中,我们通过引入梯度提升强化学习(GBRL)框架来弥合这一差距,该框架将GBT的优势扩展到RL领域。使用GBRL框架,我们实现了各种演员-评论家算法,并将它们的性能与它们的NN对应物进行了比较。受到NN中共享骨干的启发,我们为具有不同学习率的策略和值函数引入了一种共享树方法,从而在数百万次交互中提高了学习效率。GBRL在各种任务中取得了竞争性能,擅长处理具有结构化或分类特征的领域。此外,我们提供了一个高性能的、GPU加速的实现,可以与广泛使用的RL库无缝集成(可在https://github.com/NVlabs/gbrl 获取)。GBRL扩展了RL从业者的工具包,展示了GBT在RL范式中的可行性和潜力,特别是在具有结构化或分类特征的领域。
English
Neural networks (NN) achieve remarkable results in various tasks, but lack
key characteristics: interpretability, support for categorical features, and
lightweight implementations suitable for edge devices. While ongoing efforts
aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet
these requirements. As a result, GBTs have become the go-to method for
supervised learning tasks in many real-world applications and competitions.
However, their application in online learning scenarios, notably in
reinforcement learning (RL), has been limited. In this work, we bridge this gap
by introducing Gradient-Boosting RL (GBRL), a framework that extends the
advantages of GBT to the RL domain. Using the GBRL framework, we implement
various actor-critic algorithms and compare their performance with their NN
counterparts. Inspired by shared backbones in NN we introduce a tree-sharing
approach for policy and value functions with distinct learning rates, enhancing
learning efficiency over millions of interactions. GBRL achieves competitive
performance across a diverse array of tasks, excelling in domains with
structured or categorical features. Additionally, we present a
high-performance, GPU-accelerated implementation that integrates seamlessly
with widely-used RL libraries (available at https://github.com/NVlabs/gbrl).
GBRL expands the toolkit for RL practitioners, demonstrating the viability and
promise of GBT within the RL paradigm, particularly in domains characterized by
structured or categorical features.Summary
AI-Generated Summary