大規模言語モデルの低精度トレーニング：手法、課題、そして機会

要旨

大規模言語モデル（LLM）は、さまざまな領域で印象的な性能を達成してきた。しかし、その訓練に必要な膨大なハードウェアリソースは、効率性とスケーラビリティに対する大きな障壁となっている。この課題を緩和するため、低精度訓練技術が広く採用され、訓練効率の顕著な向上がもたらされている。これらの進展にもかかわらず、低精度訓練は重み、活性化、勾配など複数の要素を含み、それぞれが異なる数値形式で表現される可能性がある。その結果、低精度訓練研究は多様化し、研究者がこの分野を統一的な視点で把握することが困難になっている。本調査では、既存の低精度訓練手法を包括的にレビューする。これらのアプローチを体系的に整理するため、ハードウェア互換性、計算効率、読者にとっての参照の容易さに影響を与える重要な要素である数値形式に基づいて、主に3つのグループに分類する。カテゴリは以下の通りである：(1) 固定小数点および整数ベースの手法、(2) 浮動小数点ベースの手法、(3) カスタマイズ形式ベースの手法。さらに、順伝播中に低精度訓練と重要な類似点を持つ量子化対応訓練アプローチについても議論する。最後に、この分野を進展させるためのいくつかの有望な研究方向性を提示する。本調査で議論された論文のコレクションは、https://github.com/Hao840/Awesome-Low-Precision-Training で提供されている。

English

Large language models (LLMs) have achieved impressive performance across various domains. However, the substantial hardware resources required for their training present a significant barrier to efficiency and scalability. To mitigate this challenge, low-precision training techniques have been widely adopted, leading to notable advancements in training efficiency. Despite these gains, low-precision training involves several componentsx2013such as weights, activations, and gradientsx2013each of which can be represented in different numerical formats. The resulting diversity has created a fragmented landscape in low-precision training research, making it difficult for researchers to gain a unified overview of the field. This survey provides a comprehensive review of existing low-precision training methods. To systematically organize these approaches, we categorize them into three primary groups based on their underlying numerical formats, which is a key factor influencing hardware compatibility, computational efficiency, and ease of reference for readers. The categories are: (1) fixed-point and integer-based methods, (2) floating-point-based methods, and (3) customized format-based methods. Additionally, we discuss quantization-aware training approaches, which share key similarities with low-precision training during forward propagation. Finally, we highlight several promising research directions to advance this field. A collection of papers discussed in this survey is provided in https://github.com/Hao840/Awesome-Low-Precision-Training.

大規模言語モデルの低精度トレーニング：手法、課題、そして機会

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

要旨

Support