그래디언트 부스팅 강화 학습

초록

신경망(NN)은 다양한 과제에서 뛰어난 성과를 달성하지만, 해석 가능성, 범주형 특징 지원, 에지 디바이스에 적합한 경량 구현 등 핵심 특성을 갖추지 못하고 있습니다. 이러한 문제를 해결하기 위한 노력이 지속되고 있지만, 그래디언트 부스팅 트리(GBT)는 이러한 요구 사항을 본질적으로 충족합니다. 그 결과, GBT는 많은 실제 애플리케이션과 경쟁에서 지도 학습 과제를 위한 주요 방법으로 자리 잡았습니다. 그러나 온라인 학습 시나리오, 특히 강화 학습(RL)에서의 적용은 제한적이었습니다. 본 연구에서는 GBT의 장점을 RL 영역으로 확장하는 그래디언트 부스팅 RL(GBRL) 프레임워크를 소개하여 이러한 격차를 해소합니다. GBRL 프레임워크를 사용하여 다양한 액터-크리틱 알고리즘을 구현하고, 이를 NN 기반 알고리즘과 성능을 비교합니다. NN의 공유 백본에서 영감을 받아, 정책 및 가치 함수에 대해 서로 다른 학습률을 가진 트리 공유 방식을 도입하여 수백만 번의 상호작용에서 학습 효율성을 향상시킵니다. GBRL은 구조화된 또는 범주형 특징이 있는 영역에서 특히 뛰어난 성능을 보이며, 다양한 과제에서 경쟁력 있는 성과를 달성합니다. 또한, 널리 사용되는 RL 라이브러리와 원활하게 통합되는 고성능 GPU 가속 구현을 제시합니다(https://github.com/NVlabs/gbrl에서 확인 가능). GBRL은 RL 실무자들을 위한 도구를 확장하며, 특히 구조화된 또는 범주형 특징이 있는 영역에서 GBT의 실현 가능성과 잠재력을 입증합니다.

English

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

그래디언트 부스팅 강화 학습

Gradient Boosting Reinforcement Learning

초록

Support