VQ4DiT：用於擴散Transformer的高效後訓練向量量化

摘要

擴散Transformer模型（DiTs）已將網絡架構從傳統的UNets轉換為Transformer，在圖像生成方面展現出卓越的能力。儘管DiTs已被廣泛應用於高清晰度視頻生成任務，但其龐大的參數規模阻礙了在邊緣設備上的推斷。向量量化（VQ）可以將模型權重分解為一個碼本和分配，實現極端權重量化並顯著減少內存使用。本文提出了VQ4DiT，這是一種快速的用於DiTs的後訓練向量量化方法。我們發現傳統的VQ方法僅校準了碼本而沒有校準分配。這導致權重子向量被錯誤地分配給相同的分配，為碼本提供不一致的梯度，並導致次優異的結果。為了應對這一挑戰，VQ4DiT根據歐氏距離為每個權重子向量計算候選分配集，並基於加權平均值重建子向量。然後，使用零數據和塊狀校準方法，從該集中高效地選擇最佳分配，同時校準碼本。VQ4DiT在單個NVIDIA A100 GPU上對DiT XL/2模型進行量化，時間從20分鐘到5小時不等，具體取決於不同的量化設置。實驗表明，VQ4DiT在模型大小和性能折衷方面建立了新的最先進水平，將權重量化為2位精度，同時保持可接受的圖像生成質量。

English

The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to high-definition video generation tasks, their large parameter size hinders inference on edge devices. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast post-training vector quantization method for DiTs. We found that traditional VQ methods calibrate only the codebook without calibrating the assignments. This leads to weight sub-vectors being incorrectly assigned to the same assignment, providing inconsistent gradients to the codebook and resulting in a suboptimal result. To address this challenge, VQ4DiT calculates the candidate assignment set for each weight sub-vector based on Euclidean distance and reconstructs the sub-vector based on the weighted average. Then, using the zero-data and block-wise calibration method, the optimal assignment from the set is efficiently selected while calibrating the codebook. VQ4DiT quantizes a DiT XL/2 model on a single NVIDIA A100 GPU within 20 minutes to 5 hours depending on the different quantization settings. Experiments show that VQ4DiT establishes a new state-of-the-art in model size and performance trade-offs, quantizing weights to 2-bit precision while retaining acceptable image generation quality.

VQ4DiT：用於擴散Transformer的高效後訓練向量量化

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

摘要

Support