ChatPaper.aiChatPaper

AsyncFlow:面向高效大语言模型后训练的异步流式强化学习框架

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

July 2, 2025
作者: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu
cs.AI

摘要

强化学习(RL)已成为大型语言模型(LLM)后训练阶段的关键技术。传统的任务共置RL框架存在显著的可扩展性瓶颈,而任务分离的RL框架则面临复杂数据流及其导致的资源闲置与负载不均的挑战。此外,现有框架大多与LLM训练或推理引擎紧密耦合,难以支持定制化引擎。为应对这些挑战,我们提出了AsyncFlow,一种用于高效后训练的异步流式RL框架。具体而言,我们引入了一个分布式数据存储与传输模块,以全流式方式提供统一的数据管理和细粒度调度能力。该架构天然促进了RL任务间的自动化流水线重叠与动态负载均衡。此外,我们设计了一种基于生产者-消费者模式的异步工作流,通过在陈旧度阈值内策略性地延迟参数更新过程,最大限度地减少计算闲置。最后,AsyncFlow的核心能力在架构上与底层训练和推理引擎解耦,并通过面向服务的用户接口进行封装,提供了模块化且可定制的用户体验。大量实验表明,与最先进的基线相比,平均吞吐量提升了1.59倍。本文提出的架构为下一代RL训练系统设计提供了可操作的洞见。
English
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled with LLM training or inference engines, making it difficult to support custom-designed engines. To address these challenges, we propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training. Specifically, we introduce a distributed data storage and transfer module that provides a unified data management and fine-grained scheduling capability in a fully streamed manner. This architecture inherently facilitates automated pipeline overlapping among RL tasks and dynamic load balancing. Moreover, we propose a producer-consumer-based asynchronous workflow engineered to minimize computational idleness by strategically deferring parameter update process within staleness thresholds. Finally, the core capability of AsynFlow is architecturally decoupled from underlying training and inference engines and encapsulated by service-oriented user interfaces, offering a modular and customizable user experience. Extensive experiments demonstrate an average of 1.59 throughput improvement compared with state-of-the-art baseline. The presented architecture in this work provides actionable insights for next-generation RL training system designs.
PDF31July 4, 2025