深度學習的微調數據格式
Microscaling Data Formats for Deep Learning
October 16, 2023
作者: Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao, Michael Siu, Pradeep Dubey, Paulius Micikevicius, Maxim Naumov, Colin Verilli, Ralph Wittig, Eric Chung
cs.AI
摘要
窄位元寬度的資料格式對於降低現代深度學習應用的計算和存儲成本至關重要。本文評估了結合每個區塊縮放因子與窄浮點和整數類型的 Microscaling(MX)資料格式,用於個別元素。MX格式平衡了硬體效率、模型準確性和用戶摩擦之間的競爭需求。對超過兩打基準測試的實證結果顯示,MX資料格式作為AI推理和訓練的基準FP32的可替代品,並具有低用戶摩擦。我們還展示了首次在次8位權重、激活和梯度下訓練生成式語言模型的實例,並且在最小準確度損失的情況下,無需對訓練配方進行修改。
English
Narrow bit-width data formats are key to reducing the computational and
storage costs of modern deep learning applications. This paper evaluates
Microscaling (MX) data formats that combine a per-block scaling factor with
narrow floating-point and integer types for individual elements.MX formats
balance the competing needs of hardware efficiency, model accuracy, and user
friction. Empirical results on over two dozen benchmarks demonstrate
practicality of MX data formats as a drop-in replacement for baseline FP32 for
AI inference and training with low user friction. We also show the first
instance of training generative language models at sub-8-bit weights,
activations, and gradients with minimal accuracy loss and no modifications to
the training recipe.