대규모 언어 모델의 저정밀도 훈련: 방법, 과제, 그리고 기회

초록

대규모 언어 모델(LLMs)은 다양한 분야에서 인상적인 성능을 달성했습니다. 그러나 이들의 학습에 필요한 상당한 하드웨어 자원은 효율성과 확장성에 있어 중요한 장벽으로 작용합니다. 이러한 문제를 완화하기 위해 저정밀도 학습 기법이 널리 채택되어 학습 효율성에서 주목할 만한 진전을 이루었습니다. 이러한 성과에도 불구하고, 저정밀도 학습은 가중치, 활성화, 그래디언트와 같은 여러 구성 요소를 포함하며, 각각은 서로 다른 수치 형식으로 표현될 수 있습니다. 이로 인해 저정밀도 학습 연구 분야는 분열된 양상을 보이며, 연구자들이 이 분야를 통합적으로 이해하기 어려운 상황이 되었습니다. 본 논문은 기존의 저정밀도 학습 방법들을 포괄적으로 검토합니다. 이러한 접근법을 체계적으로 정리하기 위해, 우리는 하드웨어 호환성, 계산 효율성, 그리고 독자들의 참고 용이성에 영향을 미치는 주요 요소인 기반 수치 형식에 따라 이를 세 가지 주요 그룹으로 분류합니다. 이 분류는 (1) 고정 소수점 및 정수 기반 방법, (2) 부동 소수점 기반 방법, (3) 맞춤형 형식 기반 방법으로 이루어져 있습니다. 또한, 순전파 과정에서 저정밀도 학습과 주요 유사점을 공유하는 양자화 인지 학습 접근법에 대해 논의합니다. 마지막으로, 이 분야를 발전시킬 수 있는 몇 가지 유망한 연구 방향을 제시합니다. 본 논문에서 논의된 논문들의 모음은 https://github.com/Hao840/Awesome-Low-Precision-Training에서 확인할 수 있습니다.

English

Large language models (LLMs) have achieved impressive performance across various domains. However, the substantial hardware resources required for their training present a significant barrier to efficiency and scalability. To mitigate this challenge, low-precision training techniques have been widely adopted, leading to notable advancements in training efficiency. Despite these gains, low-precision training involves several componentsx2013such as weights, activations, and gradientsx2013each of which can be represented in different numerical formats. The resulting diversity has created a fragmented landscape in low-precision training research, making it difficult for researchers to gain a unified overview of the field. This survey provides a comprehensive review of existing low-precision training methods. To systematically organize these approaches, we categorize them into three primary groups based on their underlying numerical formats, which is a key factor influencing hardware compatibility, computational efficiency, and ease of reference for readers. The categories are: (1) fixed-point and integer-based methods, (2) floating-point-based methods, and (3) customized format-based methods. Additionally, we discuss quantization-aware training approaches, which share key similarities with low-precision training during forward propagation. Finally, we highlight several promising research directions to advance this field. A collection of papers discussed in this survey is provided in https://github.com/Hao840/Awesome-Low-Precision-Training.

대규모 언어 모델의 저정밀도 훈련: 방법, 과제, 그리고 기회

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

초록

Support