Metis:采用先进低位量化技术训练大规模语言模型
Metis: Training Large Language Models with Advanced Low-Bit Quantization
August 30, 2025
作者: Hengjie Cao, Mengyi Chen, Yifeng Yang, Ruijun Huang, Fang Dong, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Yuan Cheng, Fan Wu, Fan Yang, Tun Lu, Ning Gu, Li Shang
cs.AI
摘要
本研究揭示了各向异性参数分布是低比特量化训练大规模语言模型(LLMs)的根本障碍:少数主导奇异值产生的宽数值范围与块级量化的固有偏差相冲突。这种偏差过度保留高幅值而舍弃较小值,导致训练不稳定及模型性能低下。为此,我们提出了Metis训练框架,它整合了以下三项创新:(i) 结合谱分解与随机嵌入,高效分离主导成分与长尾成分,将宽分布压缩至适合量化的窄范围内;(ii) 在谱域采用自适应学习率,增强被忽视方向的学习,更好地捕捉对性能至关重要的多样化特征;(iii) 双范围正则化器,同时约束数值精度与参数范围分布,确保低比特训练的稳定性和无偏性。借助Metis,FP8训练超越了FP32基准,FP4训练也达到了与FP32相当的精度,为在先进低比特量化下实现稳健且可扩展的LLM训练铺平了道路。Metis的代码实现已发布于:https://github.com/typename-yyf/Metis-quantization。
English
This work identifies anisotropic parameter distributions as a fundamental
barrier to training large language models (LLMs) with low-bit quantization: a
few dominant singular values create wide numerical ranges that conflict with
the inherent bias of block-wise quantization. This bias disproportionately
preserves high-magnitude values while discarding smaller ones, causing training
instability and low model performance. This work introduces Metis, a training
framework that combines (i) spectral decomposition with random embedding to
efficiently disentangle dominant from long-tail components, compressing broad
distributions into quantization-friendly narrow ranges; (ii) adaptive learning
rates in the spectral domain to amplify underrepresented directions and better
capture diverse features critical for performance; and (iii) a dual-range
regularizer that jointly constrains numerical precision and parameter range
distribution, ensuring stable, unbiased low-bit training. With Metis, FP8
training surpasses FP32 baselines, and FP4 training achieves accuracy
comparable to FP32, paving the way for robust and scalable LLM training under
advanced low-bit quantization. The code implementation for Metis is available
at: https://github.com/typename-yyf/Metis-quantization.