ChatPaper.aiChatPaper

ECHO-2:面向高效能强化学习的大规模分布式部署框架

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

February 2, 2026
作者: Jie Xiao, Meng Chen, Qingnan Ren, Jingwei Song, Jiaqi Huang, Yangshen Deng, Chris Tong, Wanyi Chen, Suli Wang, Ziqian Bi, Shuo Lu, Yiqun Duan, Xu Wang, Rymon Yu, Ween Yang, Lynn Ai, Eric Yang, Bill Shi, Song Jingwei
cs.AI

摘要

強化學習(RL)是大型語言模型(LLM)訓練後優化階段的關鍵環節,其核心在於滾動生成、獎勵評估與集中學習之間的反覆互動。分散式滾動執行雖能利用更具成本效益的推理資源,但同時引發了廣域協調與策略分發的挑戰。本文提出ECHO-2——一個支持遠程推理節點且能應對顯著分發延遲的分佈式RL訓練框架。該框架將集中學習與分散式滾動相結合,將有限策略滯後作為用戶可控參數,實現滾動生成、策略分發與模型訓練的並行化。我們建立了基於並行處理的容量模型,闡明訓練時長、分發延遲與滾動吞吐量之間的關係,並提出維持學習器利用率的最佳資源配置規則。為緩解分發瓶頸並降低成本,ECHO-2採用對等輔助流水線廣播機制及異構節點的成本感知激活策略。在真實廣域帶寬環境下對40億和80億參數模型進行的GRPO訓練後實驗表明,ECHO-2在保持與強基線相當的RL獎勵水平同時,顯著提升了成本效率。
English
Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner utilization. To mitigate dissemination bottlenecks and lower cost, ECHO-2 employs peer-assisted pipelined broadcast and cost-aware activation of heterogeneous workers. Experiments on GRPO post-training of 4B and 8B models under real wide-area bandwidth regimes show that ECHO-2 significantly improves cost efficiency while preserving RL reward comparable to strong baselines.
PDF61February 13, 2026