ChatPaper.aiChatPaper

GEM:面向自主性大语言模型的训练场

GEM: A Gym for Agentic LLMs

October 1, 2025
作者: Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Simon Yu, Xiangxin Zhou, Haotian Xu, Shaopan Xiong, Bo Liu, Chenmien Tan, Chuen Yang Beh, Weixun Wang, Hao Zhu, Weiyan Shi, Diyi Yang, Michael Shieh, Yee Whye Teh, Wee Sun Lee, Min Lin
cs.AI

摘要

大型語言模型(LLMs)的訓練範式正從靜態數據集轉向基於經驗的學習,其中智能體通過與複雜環境的互動來獲取技能。為促進這一轉變,我們引入了GEM(通用經驗生成器),這是一個專為LLMs時代設計的開源環境模擬器。類似於傳統強化學習(RL)中的OpenAI-Gym,GEM提供了一個標準化的環境-智能體接口框架,包括用於高吞吐量的異步向量化執行,以及易於擴展的靈活包裝器。GEM還具備多樣化的環境套件、強大的集成工具,以及展示如何將GEM與五種流行的RL訓練框架結合使用的單文件示例腳本。此外,我們還提供了一組基於REINFORCE與回報批次歸一化(ReBN)的基準測試,涵蓋24個環境,與GRPO不同,ReBN兼容密集每回合獎勵的完整RL設置,並提供了更好的信用分配。我們進一步使用GEM在單回合和多回合設置下對PPO、GRPO和REINFORCE進行了同類比較基準測試,以揭示算法設計的優劣。最後,GEM除了作為訓練環境外,還是一個便捷的評估工具包。我們希望這一框架能夠幫助加速未來智能LLM的研究。
English
The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.
PDF812October 2, 2025