FP8フォーマットを用いた効率的なポストトレーニング量子化

要旨

LLM（大規模言語モデル）やDiffusionモデルなどの深層学習手法の最近の進展により、これらの現代的なアーキテクチャの計算要求を満たしつつ精度を維持するための改良された量子化手法の必要性が高まっています。この目標に向けて、我々はFP8データフォーマットの利点を、機械翻訳、言語モデリング、テキスト生成、画像分類、生成、セグメンテーションなど幅広いタスクをカバーする75のユニークなネットワークアーキテクチャにわたるポストトレーニング量子化において研究しました。我々は、動的範囲と精度のトレードオフの程度がモデルの精度に及ぼす影響を調べるために、3つの異なるFP8表現（E5M2、E4M3、E3M4）を検証しました。広範な研究に基づいて、異なるネットワークアーキテクチャにわたって一般化可能な量子化ワークフローを開発しました。我々の実証結果は、FP8フォーマットがINT8を複数の面で上回ることを示しています。具体的には、ワークロードカバレッジ（92.64% vs. 65.87%）、モデルの精度、およびより広範な操作への適合性が挙げられます。さらに、我々の調査結果は、E4M3がNLPモデルにより適しているのに対し、E3M4はコンピュータビジョンタスクにおいてE4M3よりもわずかに優れていることを示唆しています。コードはIntel Neural Compressorで公開されています: https://github.com/intel/neural-compressor。

English

Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for improved quantization methods that can meet the computational demands of these modern architectures while maintaining accuracy. Towards this goal, we study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures covering a wide range of tasks, including machine translation, language modeling, text generation, image classification, generation, and segmentation. We examine three different FP8 representations (E5M2, E4M3, and E3M4) to study the effects of varying degrees of trade-off between dynamic range and precision on model accuracy. Based on our extensive study, we developed a quantization workflow that generalizes across different network architectures. Our empirical results show that FP8 formats outperform INT8 in multiple aspects, including workload coverage (92.64% vs. 65.87%), model accuracy and suitability for a broader range of operations. Furthermore, our findings suggest that E4M3 is better suited for NLP models, whereas E3M4 performs marginally better than E4M3 on computer vision tasks. The code is publicly available on Intel Neural Compressor: https://github.com/intel/neural-compressor.

FP8フォーマットを用いた効率的なポストトレーニング量子化

Efficient Post-training Quantization with FP8 Formats

要旨

Support