潛在擴散模型的高效量化策略
Efficient Quantization Strategies for Latent Diffusion Models
December 9, 2023
作者: Yuewei Yang, Xiaoliang Dai, Jialiang Wang, Peizhao Zhang, Hongbo Zhang
cs.AI
摘要
潛在擴散模型(LDM)捕捉潛在變數隨時間的動態演變,將模式和多模態融合在一個生成系統中。儘管LDM在各種應用中表現出色,例如文本到圖像生成,這得益於強大的文本編碼器和變分自編碼器,但在邊緣設備上部署大型生成模型的迫切需求促使人們尋找更緊湊而有效的替代方案。後訓練量化(PTQ)是壓縮深度學習模型操作尺寸的方法,但當應用於LDM時會遇到由於時間和結構複雜性而帶來的挑戰。本研究提出了一種有效量化策略,通過利用信號與量化噪聲比(SQNR)作為評估的關鍵指標,有效量化LDM。通過將量化差異視為相對噪聲,並識別模型的敏感部分,我們提出了一種包含全局和局部策略的高效量化方法。全局量化過程通過在敏感區塊上啟動更高精度的量化來減輕相對量化噪聲,而局部處理則解決了量化敏感和時間敏感模塊的具體挑戰。我們的實驗結果顯示,實施全局和局部處理可實現高效且有效的潛在擴散模型(LDM)後訓練量化(PTQ)。
English
Latent Diffusion Models (LDMs) capture the dynamic evolution of latent
variables over time, blending patterns and multimodality in a generative
system. Despite the proficiency of LDM in various applications, such as
text-to-image generation, facilitated by robust text encoders and a variational
autoencoder, the critical need to deploy large generative models on edge
devices compels a search for more compact yet effective alternatives. Post
Training Quantization (PTQ), a method to compress the operational size of deep
learning models, encounters challenges when applied to LDM due to temporal and
structural complexities. This study proposes a quantization strategy that
efficiently quantize LDMs, leveraging Signal-to-Quantization-Noise Ratio (SQNR)
as a pivotal metric for evaluation. By treating the quantization discrepancy as
relative noise and identifying sensitive part(s) of a model, we propose an
efficient quantization approach encompassing both global and local strategies.
The global quantization process mitigates relative quantization noise by
initiating higher-precision quantization on sensitive blocks, while local
treatments address specific challenges in quantization-sensitive and
time-sensitive modules. The outcomes of our experiments reveal that the
implementation of both global and local treatments yields a highly efficient
and effective Post Training Quantization (PTQ) of LDMs.