梯度提升強化學習

摘要

神經網絡（NN）在各種任務中取得了顯著的成果，但缺乏關鍵特徵：可解釋性、支援分類特徵以及適用於邊緣設備的輕量級實現。儘管持續努力解決這些挑戰，梯度提升樹（GBT）本質上滿足了這些要求。因此，GBT已成為許多實際應用和競賽中監督式學習任務的首選方法。然而，在線學習場景中，特別是在強化學習（RL）中，它們的應用受到了限制。在這項工作中，我們通過引入梯度提升強化學習（GBRL）機制來彌合這一差距，該框架將GBT的優勢擴展到RL領域。利用GBRL框架，我們實現了各種演員-評論家算法，並將其性能與其NN對應物進行了比較。受到NN中共享主幹的啟發，我們引入了一種用於策略和價值函數的樹共享方法，具有不同的學習速率，從而提高了在數百萬次交互作用中的學習效率。GBRL在各種任務中實現了競爭性能，尤其擅長處理具有結構化或分類特徵的領域。此外，我們提供了一個高性能的、支持GPU加速的實現，與廣泛使用的RL庫無縫集成（可在https://github.com/NVlabs/gbrl 上找到）。GBRL擴展了RL從業者的工具包，展示了GBT在RL範式中的可行性和潛力，特別是在具有結構化或分類特徵的領域。

English

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

梯度提升強化學習

Gradient Boosting Reinforcement Learning

摘要

Support