ChatPaper.aiChatPaper

VQ4DiT:用於擴散Transformer的高效後訓練向量量化

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

August 30, 2024
作者: Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang
cs.AI

摘要

擴散Transformer模型(DiTs)已將網絡架構從傳統的UNets轉換為Transformer,在圖像生成方面展現出卓越的能力。儘管DiTs已被廣泛應用於高清晰度視頻生成任務,但其龐大的參數規模阻礙了在邊緣設備上的推斷。向量量化(VQ)可以將模型權重分解為一個碼本和分配,實現極端權重量化並顯著減少內存使用。本文提出了VQ4DiT,這是一種快速的用於DiTs的後訓練向量量化方法。我們發現傳統的VQ方法僅校準了碼本而沒有校準分配。這導致權重子向量被錯誤地分配給相同的分配,為碼本提供不一致的梯度,並導致次優異的結果。為了應對這一挑戰,VQ4DiT根據歐氏距離為每個權重子向量計算候選分配集,並基於加權平均值重建子向量。然後,使用零數據和塊狀校準方法,從該集中高效地選擇最佳分配,同時校準碼本。VQ4DiT在單個NVIDIA A100 GPU上對DiT XL/2模型進行量化,時間從20分鐘到5小時不等,具體取決於不同的量化設置。實驗表明,VQ4DiT在模型大小和性能折衷方面建立了新的最先進水平,將權重量化為2位精度,同時保持可接受的圖像生成質量。
English
The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to high-definition video generation tasks, their large parameter size hinders inference on edge devices. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast post-training vector quantization method for DiTs. We found that traditional VQ methods calibrate only the codebook without calibrating the assignments. This leads to weight sub-vectors being incorrectly assigned to the same assignment, providing inconsistent gradients to the codebook and resulting in a suboptimal result. To address this challenge, VQ4DiT calculates the candidate assignment set for each weight sub-vector based on Euclidean distance and reconstructs the sub-vector based on the weighted average. Then, using the zero-data and block-wise calibration method, the optimal assignment from the set is efficiently selected while calibrating the codebook. VQ4DiT quantizes a DiT XL/2 model on a single NVIDIA A100 GPU within 20 minutes to 5 hours depending on the different quantization settings. Experiments show that VQ4DiT establishes a new state-of-the-art in model size and performance trade-offs, quantizing weights to 2-bit precision while retaining acceptable image generation quality.

Summary

AI-Generated Summary

PDF112November 16, 2024