FP8-LM: FP8大規模言語モデルのトレーニング

要旨

本論文では、大規模言語モデル（LLM）の効率的な訓練のためのFP8低ビットデータフォーマットを探求します。我々の重要な洞察は、LLM訓練における勾配やオプティマイザ状態などのほとんどの変数が、モデルの精度を損なうことなく、ハイパーパラメータの変更を必要とせずに低精度データフォーマットを採用できるという点です。具体的には、LLM訓練のための新しいFP8自動混合精度フレームワークを提案します。このフレームワークは、LLMの混合精度および分散並列訓練を効率化するために、3つのレベルのFP8活用を提供します。これにより、8ビット勾配、オプティマイザ状態、および分散学習を段階的に組み込んでいきます。実験結果では、H100 GPUプラットフォーム上でのGPT-175Bモデルの訓練中に、我々のFP8混合精度訓練フレームワークが、実メモリ使用量を42%削減し、広く採用されているBF16フレームワーク（Megatron-LM）よりも64%高速に動作し、Nvidia Transformer Engineの速度を17%上回りました。これにより、大規模基盤モデルの訓練コストが大幅に削減されます。さらに、我々のFP8混合精度訓練手法は汎用的であり、LLMの指示チューニングや人間のフィードバックを用いた強化学習などの他のタスクにもシームレスに適用でき、ファインチューニングの費用を節約できます。我々のFP8低精度訓練フレームワークは、{https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}でオープンソースとして公開されています。

English

In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses. Our FP8 low-precision training framework is open-sourced at {https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}.

FP8-LM: FP8大規模言語モデルのトレーニング

FP8-LM: Training FP8 Large Language Models

要旨

Support