通过向量量化实现文本到图像扩散模型的准确压缩
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
August 31, 2024
作者: Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk
cs.AI
摘要
文本到图像扩散模型已经成为一个强大的框架,用于根据文本提示生成高质量图像。它们的成功推动了生产级扩散模型的快速发展,这些模型不断增大,已经包含数十亿个参数。因此,最先进的文本到图像模型在实践中变得越来越不易访问,特别是在资源有限的环境中。后训练量化(PTQ)通过将预训练模型权重压缩为较低位表示来解决这个问题。最近的扩散量化技术主要依赖于均匀标量量化,为将模型压缩到4位的模型提供了良好的性能。这项工作表明,更多功能的向量量化(VQ)可能会为大规模文本到图像扩散模型实现更高的压缩率。具体而言,我们将基于向量的PTQ方法定制为最近的十亿级文本到图像模型(SDXL和SDXL-Turbo),并展示了将具有20亿参数的扩散模型使用VQ压缩到约3位时,其图像质量和文本对齐与先前的4位压缩技术相似。
English
Text-to-image diffusion models have emerged as a powerful framework for
high-quality image generation given textual prompts. Their success has driven
the rapid development of production-grade diffusion models that consistently
increase in size and already contain billions of parameters. As a result,
state-of-the-art text-to-image models are becoming less accessible in practice,
especially in resource-limited environments. Post-training quantization (PTQ)
tackles this issue by compressing the pretrained model weights into lower-bit
representations. Recent diffusion quantization techniques primarily rely on
uniform scalar quantization, providing decent performance for the models
compressed to 4 bits. This work demonstrates that more versatile vector
quantization (VQ) may achieve higher compression rates for large-scale
text-to-image diffusion models. Specifically, we tailor vector-based PTQ
methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo), and
show that the diffusion models of 2B+ parameters compressed to around 3 bits
using VQ exhibit the similar image quality and textual alignment as previous
4-bit compression techniques.Summary
AI-Generated Summary