使用FP4量化優化大型語言模型訓練

摘要

訓練大型語言模型（LLMs）所需的計算需求不斷增加，需要更有效的方法。量化訓練提供了一個有前途的解決方案，通過使用低位算術運算來降低成本。儘管FP8精度已經證明是可行的，但利用FP4仍然是一個挑戰，因為存在著顯著的量化誤差和有限的表示能力。本研究引入了第一個針對LLMs的FP4訓練框架，通過兩個關鍵創新來應對這些挑戰：一個可微分的量化估算器用於精確的權重更新，以及一個異常值夾緊和補償策略來防止激活崩潰。為確保穩定性，該框架集成了混合精度訓練方案和向量化量化。實驗結果表明，我們的FP4框架實現了與BF16和FP8相當的準確性，並具有最小的降級，有效擴展到訓練了高達100B令牌的13B參數LLMs。隨著支持FP4的下一代硬件的出現，我們的框架為高效的超低精度訓練奠定了基礎。

English

The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.

使用FP4量化優化大型語言模型訓練

Optimizing Large Language Model Training Using FP4 Quantization

摘要

Support