勾配ブースティング強化学習

要旨

ニューラルネットワーク（NN）は様々なタスクで顕著な成果を上げていますが、解釈可能性、カテゴリカル特徴量のサポート、エッジデバイス向けの軽量実装といった重要な特性を欠いています。これらの課題に対処するための取り組みが進行中ですが、勾配ブースティング木（GBT）はこれらの要件を本質的に満たしています。その結果、GBTは多くの実世界のアプリケーションやコンペティションにおける教師あり学習タスクの定番手法となっています。しかし、オンライン学習シナリオ、特に強化学習（RL）におけるGBTの応用は限られていました。本研究では、このギャップを埋めるために、GBTの利点をRL領域に拡張するGradient-Boosting RL（GBRL）フレームワークを導入します。GBRLフレームワークを用いて、様々なアクター・クリティックアルゴリズムを実装し、それらの性能をNNベースの対応手法と比較します。NNにおける共有バックボーンに着想を得て、異なる学習率を持つポリシー関数と価値関数のためのツリー共有アプローチを導入し、数百万回のインタラクションにわたる学習効率を向上させます。GBRLは、構造化された特徴量やカテゴリカル特徴量が支配的な領域で特に優れた性能を発揮し、多様なタスクにおいて競争力のある性能を達成します。さらに、広く使用されているRLライブラリとシームレスに統合する、高性能なGPUアクセラレーション実装を提供します（https://github.com/NVlabs/gbrl で入手可能）。GBRLは、RL実践者のためのツールキットを拡張し、特に構造化された特徴量やカテゴリカル特徴量が特徴的な領域において、RLパラダイム内でのGBTの実現可能性と将来性を示しています。

English

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

勾配ブースティング強化学習

Gradient Boosting Reinforcement Learning

要旨

Support