通過向量量化實現文本到圖像擴散模型的準確壓縮
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
August 31, 2024
作者: Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk
cs.AI
摘要
文字到圖像擴散模型已成為一個強大的框架,用於根據文本提示生成高質量圖像。它們的成功推動了生產級擴散模型的快速發展,這些模型不斷增大,已包含數十億個參數。因此,最先進的文字到圖像模型在實踐中變得越來越不易訪問,尤其是在資源有限的環境中。事後訓練量化(PTQ)通過將預訓練模型權重壓縮為低位表示來應對這個問題。最近的擴散量化技術主要依賴於均勻標量量化,為壓縮為4位的模型提供了不錯的性能。本研究表明,更多功能的向量量化(VQ)可能實現大規模文字到圖像擴散模型的更高壓縮率。具體而言,我們將基於向量的PTQ方法定制為最近的十億級文字到圖像模型(SDXL和SDXL-Turbo),並展示了將具有20億參數的擴散模型壓縮為約3位,使用VQ展現出與先前4位壓縮技術相似的圖像質量和文本對齊。
English
Text-to-image diffusion models have emerged as a powerful framework for
high-quality image generation given textual prompts. Their success has driven
the rapid development of production-grade diffusion models that consistently
increase in size and already contain billions of parameters. As a result,
state-of-the-art text-to-image models are becoming less accessible in practice,
especially in resource-limited environments. Post-training quantization (PTQ)
tackles this issue by compressing the pretrained model weights into lower-bit
representations. Recent diffusion quantization techniques primarily rely on
uniform scalar quantization, providing decent performance for the models
compressed to 4 bits. This work demonstrates that more versatile vector
quantization (VQ) may achieve higher compression rates for large-scale
text-to-image diffusion models. Specifically, we tailor vector-based PTQ
methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo), and
show that the diffusion models of 2B+ parameters compressed to around 3 bits
using VQ exhibit the similar image quality and textual alignment as previous
4-bit compression techniques.Summary
AI-Generated Summary