DeepSpeed-Chat：轻松、快速且经济实惠地训练 ChatGPT 类似的模型，覆盖所有规模

摘要

类似ChatGPT的模型已经在人工智能的各个应用中引起了革命，从摘要和编码到翻译，甚至超越了人类的表现。然而，当前的情况缺乏一种可访问、高效和经济有效的端到端RLHF（带人类反馈的强化学习）训练流程，特别是在训练数十亿参数规模时。本文介绍了DeepSpeed-Chat，这是一个新颖的系统，使RLHF训练对AI社区变得更加可访问。DeepSpeed-Chat提供了三个关键功能：一个易于使用的ChatGPT样式模型的训练和推理体验，一个DeepSpeed-RLHF流程，复制了InstructGPT的训练流程，以及一个强大的DeepSpeed-RLHF系统，以统一的方式结合了各种优化，用于训练和推理。该系统提供了无与伦比的效率和可扩展性，使得能够在创纪录的时间内以较低成本训练数千亿参数规模的模型。通过这一发展，DeepSpeed-Chat为更广泛地获得先进的RLHF训练铺平了道路，即使对于资源有限的数据科学家，也能促进人工智能领域的创新和进一步发展。

English

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.

DeepSpeed-Chat：轻松、快速且经济实惠地训练 ChatGPT 类似的模型，覆盖所有规模

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

摘要

Support