Metis：采用先进低位量化技术训练大规模语言模型

摘要

本研究揭示了各向异性参数分布是低比特量化训练大规模语言模型（LLMs）的根本障碍：少数主导奇异值产生的宽数值范围与块级量化的固有偏差相冲突。这种偏差过度保留高幅值而舍弃较小值，导致训练不稳定及模型性能低下。为此，我们提出了Metis训练框架，它整合了以下三项创新：(i) 结合谱分解与随机嵌入，高效分离主导成分与长尾成分，将宽分布压缩至适合量化的窄范围内；(ii) 在谱域采用自适应学习率，增强被忽视方向的学习，更好地捕捉对性能至关重要的多样化特征；(iii) 双范围正则化器，同时约束数值精度与参数范围分布，确保低比特训练的稳定性和无偏性。借助Metis，FP8训练超越了FP32基准，FP4训练也达到了与FP32相当的精度，为在先进低比特量化下实现稳健且可扩展的LLM训练铺平了道路。Metis的代码实现已发布于：https://github.com/typename-yyf/Metis-quantization。

English

This work identifies anisotropic parameter distributions as a fundamental barrier to training large language models (LLMs) with low-bit quantization: a few dominant singular values create wide numerical ranges that conflict with the inherent bias of block-wise quantization. This bias disproportionately preserves high-magnitude values while discarding smaller ones, causing training instability and low model performance. This work introduces Metis, a training framework that combines (i) spectral decomposition with random embedding to efficiently disentangle dominant from long-tail components, compressing broad distributions into quantization-friendly narrow ranges; (ii) adaptive learning rates in the spectral domain to amplify underrepresented directions and better capture diverse features critical for performance; and (iii) a dual-range regularizer that jointly constrains numerical precision and parameter range distribution, ensuring stable, unbiased low-bit training. With Metis, FP8 training surpasses FP32 baselines, and FP4 training achieves accuracy comparable to FP32, paving the way for robust and scalable LLM training under advanced low-bit quantization. The code implementation for Metis is available at: https://github.com/typename-yyf/Metis-quantization.

Metis：采用先进低位量化技术训练大规模语言模型

Metis: Training Large Language Models with Advanced Low-Bit Quantization

摘要

Support