GEM：エージェンシックLLMのためのジム

要旨

大規模言語モデル（LLM）のトレーニングパラダイムは、静的なデータセットから経験ベースの学習へと移行しつつあり、エージェントは複雑な環境との相互作用を通じてスキルを獲得する。この移行を促進するため、我々はGEM（General Experience Maker）を導入する。これはLLM時代に向けて設計されたオープンソースの環境シミュレータであり、従来の強化学習（RL）におけるOpenAI-Gymに相当する。GEMは、環境とエージェントのインターフェースを標準化するフレームワークを提供し、非同期ベクトル化実行による高スループットや、容易な拡張性を実現する柔軟なラッパーを含む。また、GEMは多様な環境スイート、堅牢な統合ツール、および5つの主要なRLトレーニングフレームワークとGEMを使用する単一ファイルのサンプルスクリプトを特徴とする。これに加えて、我々はReBN（Return Batch Normalization）を適用したREINFORCEを用いて、24の環境にわたるベースラインを提供する。ReBNはGRPOとは異なり、密なターンごとの報酬を伴う完全なRL設定と互換性があり、より優れたクレジット割り当てを提供する。さらに、PPO、GRPO、およびREINFORCEをGEMを使用してシングルターンおよびマルチターンの設定で公平にベンチマークし、アルゴリズム設計に関する洞察を提供する。最後に、GEMはトレーニング環境だけでなく、便利な評価ツールキットとしても機能する。このフレームワークが、将来のエージェント型LLM研究の加速に役立つことを期待する。

English

The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.

GEM：エージェンシックLLMのためのジム

GEM: A Gym for Agentic LLMs

要旨

Support