梯度提升強化學習
Gradient Boosting Reinforcement Learning
July 11, 2024
作者: Benjamin Fuhrer, Chen Tessler, Gal Dalal
cs.AI
摘要
神經網絡(NN)在各種任務中取得了顯著的成果,但缺乏關鍵特徵:可解釋性、支援分類特徵以及適用於邊緣設備的輕量級實現。儘管持續努力解決這些挑戰,梯度提升樹(GBT)本質上滿足了這些要求。因此,GBT已成為許多實際應用和競賽中監督式學習任務的首選方法。然而,在線學習場景中,特別是在強化學習(RL)中,它們的應用受到了限制。在這項工作中,我們通過引入梯度提升強化學習(GBRL)機制來彌合這一差距,該框架將GBT的優勢擴展到RL領域。利用GBRL框架,我們實現了各種演員-評論家算法,並將其性能與其NN對應物進行了比較。受到NN中共享主幹的啟發,我們引入了一種用於策略和價值函數的樹共享方法,具有不同的學習速率,從而提高了在數百萬次交互作用中的學習效率。GBRL在各種任務中實現了競爭性能,尤其擅長處理具有結構化或分類特徵的領域。此外,我們提供了一個高性能的、支持GPU加速的實現,與廣泛使用的RL庫無縫集成(可在https://github.com/NVlabs/gbrl 上找到)。GBRL擴展了RL從業者的工具包,展示了GBT在RL範式中的可行性和潛力,特別是在具有結構化或分類特徵的領域。
English
Neural networks (NN) achieve remarkable results in various tasks, but lack
key characteristics: interpretability, support for categorical features, and
lightweight implementations suitable for edge devices. While ongoing efforts
aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet
these requirements. As a result, GBTs have become the go-to method for
supervised learning tasks in many real-world applications and competitions.
However, their application in online learning scenarios, notably in
reinforcement learning (RL), has been limited. In this work, we bridge this gap
by introducing Gradient-Boosting RL (GBRL), a framework that extends the
advantages of GBT to the RL domain. Using the GBRL framework, we implement
various actor-critic algorithms and compare their performance with their NN
counterparts. Inspired by shared backbones in NN we introduce a tree-sharing
approach for policy and value functions with distinct learning rates, enhancing
learning efficiency over millions of interactions. GBRL achieves competitive
performance across a diverse array of tasks, excelling in domains with
structured or categorical features. Additionally, we present a
high-performance, GPU-accelerated implementation that integrates seamlessly
with widely-used RL libraries (available at https://github.com/NVlabs/gbrl).
GBRL expands the toolkit for RL practitioners, demonstrating the viability and
promise of GBT within the RL paradigm, particularly in domains characterized by
structured or categorical features.Summary
AI-Generated Summary