大型語言模型的低精度訓練:方法、挑戰與機遇
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
May 2, 2025
作者: Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao
cs.AI
摘要
大型语言模型(LLMs)在多个领域取得了令人瞩目的性能。然而,其训练所需的大量硬件资源对效率和可扩展性构成了显著障碍。为缓解这一挑战,低精度训练技术已被广泛采用,显著提升了训练效率。尽管取得了这些进展,低精度训练涉及多个组件——如权重、激活值和梯度——每个组件都可以用不同的数值格式表示。由此产生的多样性使得低精度训练研究领域呈现碎片化,研究人员难以获得对该领域的统一概览。本综述对现有的低精度训练方法进行了全面回顾。为系统化组织这些方法,我们根据其底层数值格式将其分为三大类,这是影响硬件兼容性、计算效率以及读者参考便利性的关键因素。这些类别包括:(1)定点与整数方法,(2)浮点方法,以及(3)自定义格式方法。此外,我们讨论了量化感知训练方法,这些方法在前向传播过程中与低精度训练具有关键相似性。最后,我们指出了推动该领域发展的若干有前景的研究方向。本综述中讨论的论文合集可在 https://github.com/Hao840/Awesome-Low-Precision-Training 获取。
English
Large language models (LLMs) have achieved impressive performance across
various domains. However, the substantial hardware resources required for their
training present a significant barrier to efficiency and scalability. To
mitigate this challenge, low-precision training techniques have been widely
adopted, leading to notable advancements in training efficiency. Despite these
gains, low-precision training involves several componentsx2013such
as weights, activations, and gradientsx2013each of which can be
represented in different numerical formats. The resulting diversity has created
a fragmented landscape in low-precision training research, making it difficult
for researchers to gain a unified overview of the field. This survey provides a
comprehensive review of existing low-precision training methods. To
systematically organize these approaches, we categorize them into three primary
groups based on their underlying numerical formats, which is a key factor
influencing hardware compatibility, computational efficiency, and ease of
reference for readers. The categories are: (1) fixed-point and integer-based
methods, (2) floating-point-based methods, and (3) customized format-based
methods. Additionally, we discuss quantization-aware training approaches, which
share key similarities with low-precision training during forward propagation.
Finally, we highlight several promising research directions to advance this
field. A collection of papers discussed in this survey is provided in
https://github.com/Hao840/Awesome-Low-Precision-Training.Summary
AI-Generated Summary