OpenRLHF:一个易于使用、可扩展且高性能的RLHF框架
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
May 20, 2024
作者: Jian Hu, Xibin Wu, Weixun Wang, Xianyu, Dehao Zhang, Yu Cao
cs.AI
摘要
随着大型语言模型(LLMs)按照规模定律不断增长,基于人类反馈的强化学习(RLHF)因其出色的性能而受到了广泛关注。然而,与对单个模型进行预训练或微调不同,为了训练大型语言模型,通过人类反馈进行强化学习(RLHF)存在着跨四个模型的协调挑战。我们提出了OpenRLHF,这是一个开源框架,可以实现高效的RLHF扩展。与现有的RLHF框架不同,这些框架将四个模型放置在同一GPU上,OpenRLHF通过使用Ray、vLLM和DeepSpeed重新设计了模型的调度,实现了超过70B参数的模型的训练,从而提高了资源利用率并采用了多样化的训练方法。OpenRLHF与Hugging Face完美集成,提供了一个即插即用的解决方案,具有优化的算法和启动脚本,确保了用户友好性。OpenRLHF实现了RLHF、DPO、拒绝抽样和其他对齐技术。作为最先进的LLM开发的助力,OpenRLHF的代码可在https://github.com/OpenLLMAI/OpenRLHF 上获得。
English
As large language models (LLMs) continue to grow by scaling laws,
reinforcement learning from human feedback (RLHF) has gained significant
attention due to its outstanding performance. However, unlike pretraining or
fine-tuning a single model, scaling reinforcement learning from human feedback
(RLHF) for training large language models poses coordination challenges across
four models. We present OpenRLHF, an open-source framework enabling efficient
RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the
same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters
using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and
diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF
provides an out-of-the-box solution with optimized algorithms and launch
scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO,
rejection sampling, and other alignment techniques. Empowering state-of-the-art
LLM development, OpenRLHF's code is available at
https://github.com/OpenLLMAI/OpenRLHF.Summary
AI-Generated Summary