Skalierungsgesetze für das Training mit Gleitkomma-Quantisierung

papers.abstract

Das Training mit geringer Präzision wird als eine effektive Strategie zur Reduzierung sowohl der Trainings- als auch der nachgelagerten Inferenzkosten betrachtet. Frühere Skalierungsgesetze für Präzision konzentrieren sich hauptsächlich auf die Ganzzahl-Quantisierung, die weniger Aufmerksamkeit auf die Bestandteile der Gleitkomma-Quantisierung richtet und somit nicht gut zu den LLM-Verlusten in diesem Szenario passt. Im Gegensatz dazu, obwohl das Training mit Gleitkomma-Quantisierung in der Produktion häufiger implementiert wird, war die Forschung dazu bisher relativ oberflächlich. In diesem Paper erforschen wir eingehend die Auswirkungen von Gleitkomma-Quantisierungszielen, Exponentenbits, Mantissenbits und der Berechnungsgranularität des Skalierungsfaktors auf die Trainingsleistung von LLM-Modellen mit Gleitkomma-Quantisierung. Während wir ein genaues Gleitkomma-Quantisierungsvereinheitlichungsgesetz vorstellen, bieten wir auch wertvolle Vorschläge für die Community: (1) Exponentenbits tragen etwas mehr zur Modellleistung bei als Mantissenbits. Wir bieten das optimale Exponenten-Mantissen-Bit-Verhältnis für verschiedene Bit-Zahlen an, das für zukünftige Referenzen von Hardwareherstellern verfügbar ist; (2) Wir entdecken die Bildung der kritischen Datengröße beim Training mit geringer Präzision von LLM. Zu viele Trainingsdaten, die die kritische Datengröße überschreiten, führen umgekehrt zu einer Verschlechterung der LLM-Leistung; (3) Die optimale Gleitkomma-Quantisierungspräzision ist direkt proportional zur Rechenleistung, aber innerhalb eines weiten Bereichs von Rechenleistungen schätzen wir, dass die beste Kosten-Leistungs-Präzision zwischen 4-8 Bits liegt.

English

Low-precision training is considered an effective strategy for reducing both training and downstream inference costs. Previous scaling laws for precision mainly focus on integer quantization, which pay less attention to the constituents in floating-point quantization and thus cannot well fit the LLM losses in this scenario. In contrast, while floating-point quantization training is more commonly implemented in production, the research on it has been relatively superficial. In this paper, we thoroughly explore the effects of floating-point quantization targets, exponent bits, mantissa bits, and the calculation granularity of the scaling factor in floating-point quantization training performance of LLM models. While presenting an accurate floating-point quantization unified scaling law, we also provide valuable suggestions for the community: (1) Exponent bits contribute slightly more to the model performance than mantissa bits. We provide the optimal exponent-mantissa bit ratio for different bit numbers, which is available for future reference by hardware manufacturers; (2) We discover the formation of the critical data size in low-precision LLM training. Too much training data exceeding the critical data size will inversely bring in degradation of LLM performance; (3) The optimal floating-point quantization precision is directly proportional to the computational power, but within a wide computational power range, we estimate that the best cost-performance precision lies between 4-8 bits.

Skalierungsgesetze für das Training mit Gleitkomma-Quantisierung

Scaling Laws for Floating Point Quantization Training

papers.abstract

Support