VQ4DiT:用於擴散Transformer的高效後訓練向量量化
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
August 30, 2024
作者: Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang
cs.AI
摘要
擴散Transformer模型(DiTs)已將網絡架構從傳統的UNets轉換為Transformer,在圖像生成方面展現出卓越的能力。儘管DiTs已被廣泛應用於高清晰度視頻生成任務,但其龐大的參數規模阻礙了在邊緣設備上的推斷。向量量化(VQ)可以將模型權重分解為一個碼本和分配,實現極端權重量化並顯著減少內存使用。本文提出了VQ4DiT,這是一種快速的用於DiTs的後訓練向量量化方法。我們發現傳統的VQ方法僅校準了碼本而沒有校準分配。這導致權重子向量被錯誤地分配給相同的分配,為碼本提供不一致的梯度,並導致次優異的結果。為了應對這一挑戰,VQ4DiT根據歐氏距離為每個權重子向量計算候選分配集,並基於加權平均值重建子向量。然後,使用零數據和塊狀校準方法,從該集中高效地選擇最佳分配,同時校準碼本。VQ4DiT在單個NVIDIA A100 GPU上對DiT XL/2模型進行量化,時間從20分鐘到5小時不等,具體取決於不同的量化設置。實驗表明,VQ4DiT在模型大小和性能折衷方面建立了新的最先進水平,將權重量化為2位精度,同時保持可接受的圖像生成質量。
English
The Diffusion Transformers Models (DiTs) have transitioned the network
architecture from traditional UNets to transformers, demonstrating exceptional
capabilities in image generation. Although DiTs have been widely applied to
high-definition video generation tasks, their large parameter size hinders
inference on edge devices. Vector quantization (VQ) can decompose model weight
into a codebook and assignments, allowing extreme weight quantization and
significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast
post-training vector quantization method for DiTs. We found that traditional VQ
methods calibrate only the codebook without calibrating the assignments. This
leads to weight sub-vectors being incorrectly assigned to the same assignment,
providing inconsistent gradients to the codebook and resulting in a suboptimal
result. To address this challenge, VQ4DiT calculates the candidate assignment
set for each weight sub-vector based on Euclidean distance and reconstructs the
sub-vector based on the weighted average. Then, using the zero-data and
block-wise calibration method, the optimal assignment from the set is
efficiently selected while calibrating the codebook. VQ4DiT quantizes a DiT
XL/2 model on a single NVIDIA A100 GPU within 20 minutes to 5 hours depending
on the different quantization settings. Experiments show that VQ4DiT
establishes a new state-of-the-art in model size and performance trade-offs,
quantizing weights to 2-bit precision while retaining acceptable image
generation quality.Summary
AI-Generated Summary