潜在扩散模型的高效量化策略

摘要

潜在扩散模型（LDM）捕捉了隐变量随时间的动态演变，将模式和多模态融合在一个生成系统中。尽管LDM在各种应用中表现出色，例如文本到图像生成，借助强大的文本编码器和变分自编码器，但在边缘设备上部署大型生成模型的迫切需求促使人们寻找更紧凑而有效的替代方案。后训练量化（PTQ）是一种压缩深度学习模型操作尺寸的方法，但在应用于LDM时会遇到由于时间和结构复杂性而带来的挑战。本研究提出了一种量化策略，有效地量化LDM，利用信噪比（SQNR）作为评估的关键指标。通过将量化差异视为相对噪声，并识别模型的敏感部分，我们提出了一种包含全局和局部策略的高效量化方法。全局量化过程通过在敏感块上启动更高精度的量化来减轻相对量化噪声，而局部处理则解决了量化敏感和时间敏感模块的具体挑战。我们的实验结果表明，实施全局和局部处理可以实现高效且有效的潜在扩散模型的后训练量化（PTQ）。

English

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image generation, facilitated by robust text encoders and a variational autoencoder, the critical need to deploy large generative models on edge devices compels a search for more compact yet effective alternatives. Post Training Quantization (PTQ), a method to compress the operational size of deep learning models, encounters challenges when applied to LDM due to temporal and structural complexities. This study proposes a quantization strategy that efficiently quantize LDMs, leveraging Signal-to-Quantization-Noise Ratio (SQNR) as a pivotal metric for evaluation. By treating the quantization discrepancy as relative noise and identifying sensitive part(s) of a model, we propose an efficient quantization approach encompassing both global and local strategies. The global quantization process mitigates relative quantization noise by initiating higher-precision quantization on sensitive blocks, while local treatments address specific challenges in quantization-sensitive and time-sensitive modules. The outcomes of our experiments reveal that the implementation of both global and local treatments yields a highly efficient and effective Post Training Quantization (PTQ) of LDMs.

潜在扩散模型的高效量化策略

Efficient Quantization Strategies for Latent Diffusion Models

摘要

Support