ChatPaper.aiChatPaper

FP8-LM:訓練FP8大型語言模型

FP8-LM: Training FP8 Large Language Models

October 27, 2023
作者: Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng
cs.AI

摘要

本文探討了用於高效訓練大型語言模型(LLMs)的FP8低位數據格式。我們的關鍵見解是,在LLM訓練中,大多數變量(如梯度和優化器狀態)可以使用低精度數據格式,而不會影響模型準確性,也無需更改超參數。具體而言,我們提出了一個新的FP8自動混合精度框架,用於LLMs的訓練。該框架提供三個FP8利用級別,以簡化LLMs的混合精度和分佈式並行訓練。它逐步以增量方式將8位梯度、優化器狀態和分佈式學習納入其中。實驗結果顯示,在H100 GPU平台上訓練GPT-175B模型期間,我們的FP8混合精度訓練框架不僅實現了顯著的42%實際內存使用量減少,而且比廣泛採用的BF16框架(即Megatron-LM)運行速度快64%,超越了Nvidia Transformer Engine 17%的速度。這在很大程度上降低了大型基礎模型的訓練成本。此外,我們的FP8混合精度訓練方法是通用的。它可以無縫應用於其他任務,如LLM指令調整和帶有人類反饋的強化學習,從而節省微調費用。我們的FP8低精度訓練框架已在{https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}上開源。
English
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses. Our FP8 low-precision training framework is open-sourced at {https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}.
PDF332December 15, 2024