梯度提升强化学习

摘要

神经网络（NN）在各种任务中取得了显著的成果，但缺乏关键特征：可解释性、对分类特征的支持以及适用于边缘设备的轻量级实现。尽管正在进行的努力旨在解决这些挑战，但梯度提升树（GBT）本质上满足了这些要求。因此，GBT已成为许多现实世界应用和竞赛中监督学习任务的首选方法。然而，它们在在线学习场景中的应用，特别是在强化学习（RL）中，受到了限制。在这项工作中，我们通过引入梯度提升强化学习（GBRL）框架来弥合这一差距，该框架将GBT的优势扩展到RL领域。使用GBRL框架，我们实现了各种演员-评论家算法，并将它们的性能与它们的NN对应物进行了比较。受到NN中共享骨干的启发，我们为具有不同学习率的策略和值函数引入了一种共享树方法，从而在数百万次交互中提高了学习效率。GBRL在各种任务中取得了竞争性能，擅长处理具有结构化或分类特征的领域。此外，我们提供了一个高性能的、GPU加速的实现，可以与广泛使用的RL库无缝集成（可在https://github.com/NVlabs/gbrl 获取）。GBRL扩展了RL从业者的工具包，展示了GBT在RL范式中的可行性和潜力，特别是在具有结构化或分类特征的领域。

English

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

梯度提升强化学习

Gradient Boosting Reinforcement Learning

摘要

Support