DeepSpeed-Chat：輕鬆、快速且經濟實惠地訓練 ChatGPT-like 模型，適用於各種規模。

摘要

像ChatGPT一樣的模型已經在人工智慧的各種應用中引起了革命，從摘要和編碼到翻譯，甚至超越了人類的表現。然而，目前的情況缺乏一個可存取、高效且具成本效益的端對端RLHF（Reinforcement Learning with Human Feedback）訓練管道，尤其是當在數十億參數的規模上進行訓練時。本文介紹了DeepSpeed-Chat，這是一個新穎的系統，使RLHF訓練對AI社區變得更加可存取。DeepSpeed-Chat提供了三個關鍵功能：一個易於使用的ChatGPT-like模型的訓練和推斷體驗，一個DeepSpeed-RLHF管道，復制了InstructGPT的訓練管道，以及一個強大的DeepSpeed-RLHF系統，結合了各種優化，以統一方式進行訓練和推斷。該系統提供了無與倫比的效率和可擴展性，使得能夠在短時間內以及成本的一小部分訓練具有數千億參數的模型成為可能。通過這一發展，DeepSpeed-Chat為更廣泛地存取先進的RLHF訓練鋪平了道路，即使是資源有限的數據科學家，也能促進AI領域的創新和進一步發展。

English

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.

DeepSpeed-Chat：輕鬆、快速且經濟實惠地訓練 ChatGPT-like 模型，適用於各種規模。

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

摘要

Support