潜在拡散モデルのための効率的な量子化戦略

要旨

潜在拡散モデル（LDMs）は、潜在変数の時間的変化を捉え、生成システムにおいてパターンと多様性を融合させます。LDMsは、強力なテキストエンコーダと変分オートエンコーダを活用したテキストから画像への生成など、さまざまなアプリケーションで高い能力を発揮していますが、大規模な生成モデルをエッジデバイスに展開する必要性から、よりコンパクトで効果的な代替手法の探索が求められています。ポストトレーニング量子化（PTQ）は、深層学習モデルの動作サイズを圧縮する手法ですが、LDMsに適用する際には時間的および構造的な複雑さにより課題が生じます。本研究では、Signal-to-Quantization-Noise Ratio（SQNR）を評価の重要な指標として活用し、LDMsを効率的に量子化する戦略を提案します。量子化の誤差を相対的なノイズとして扱い、モデルの敏感な部分を特定することで、グローバルおよびローカルの戦略を包含する効率的な量子化アプローチを提案します。グローバル量子化プロセスでは、敏感なブロックに対して高精度の量子化を開始することで相対的な量子化ノイズを軽減し、ローカル処理では量子化に敏感なモジュールや時間に敏感なモジュールにおける特定の課題に対処します。実験結果から、グローバルおよびローカルの処理を組み合わせることで、LDMsのポストトレーニング量子化（PTQ）が非常に効率的かつ効果的に実現されることが明らかになりました。

English

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image generation, facilitated by robust text encoders and a variational autoencoder, the critical need to deploy large generative models on edge devices compels a search for more compact yet effective alternatives. Post Training Quantization (PTQ), a method to compress the operational size of deep learning models, encounters challenges when applied to LDM due to temporal and structural complexities. This study proposes a quantization strategy that efficiently quantize LDMs, leveraging Signal-to-Quantization-Noise Ratio (SQNR) as a pivotal metric for evaluation. By treating the quantization discrepancy as relative noise and identifying sensitive part(s) of a model, we propose an efficient quantization approach encompassing both global and local strategies. The global quantization process mitigates relative quantization noise by initiating higher-precision quantization on sensitive blocks, while local treatments address specific challenges in quantization-sensitive and time-sensitive modules. The outcomes of our experiments reveal that the implementation of both global and local treatments yields a highly efficient and effective Post Training Quantization (PTQ) of LDMs.

潜在拡散モデルのための効率的な量子化戦略

Efficient Quantization Strategies for Latent Diffusion Models

要旨

Support