Metis:運用先進低位元量化技術訓練大型語言模型
Metis: Training Large Language Models with Advanced Low-Bit Quantization
August 30, 2025
作者: Hengjie Cao, Mengyi Chen, Yifeng Yang, Ruijun Huang, Fang Dong, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Yuan Cheng, Fan Wu, Fan Yang, Tun Lu, Ning Gu, Li Shang
cs.AI
摘要
本研究揭示了各向異性參數分佈作為訓練低比特量化大型語言模型(LLMs)的根本障礙:少數主導奇異值產生的寬廣數值範圍與塊級量化的固有偏見相衝突。這種偏見不成比例地保留高幅值而捨棄較小值,導致訓練不穩定和模型性能低下。本文提出Metis訓練框架,結合以下三點:(i) 譜分解與隨機嵌入,有效分離主導成分與長尾成分,將寬廣分佈壓縮至適合量化的狹窄範圍;(ii) 在譜域中採用自適應學習率,放大未被充分代表的方向,更好地捕捉對性能至關重要的多樣特徵;(iii) 雙範圍正則化器,聯合約束數值精度與參數範圍分佈,確保穩定、無偏的低比特訓練。藉助Metis,FP8訓練超越FP32基線,FP4訓練達到與FP32相當的精度,為在先進低比特量化下實現穩健且可擴展的LLM訓練鋪平道路。Metis的代碼實現已公開於:https://github.com/typename-yyf/Metis-quantization。
English
This work identifies anisotropic parameter distributions as a fundamental
barrier to training large language models (LLMs) with low-bit quantization: a
few dominant singular values create wide numerical ranges that conflict with
the inherent bias of block-wise quantization. This bias disproportionately
preserves high-magnitude values while discarding smaller ones, causing training
instability and low model performance. This work introduces Metis, a training
framework that combines (i) spectral decomposition with random embedding to
efficiently disentangle dominant from long-tail components, compressing broad
distributions into quantization-friendly narrow ranges; (ii) adaptive learning
rates in the spectral domain to amplify underrepresented directions and better
capture diverse features critical for performance; and (iii) a dual-range
regularizer that jointly constrains numerical precision and parameter range
distribution, ensuring stable, unbiased low-bit training. With Metis, FP8
training surpasses FP32 baselines, and FP4 training achieves accuracy
comparable to FP32, paving the way for robust and scalable LLM training under
advanced low-bit quantization. The code implementation for Metis is available
at: https://github.com/typename-yyf/Metis-quantization.