无需反向传播实现量化扩散模型的高效个性化
Efficient Personalization of Quantized Diffusion Model without Backpropagation
March 19, 2025
作者: Hoigi Seo, Wongi Jeong, Kyungryeol Lee, Se Young Chun
cs.AI
摘要
扩散模型在图像合成领域展现了卓越的性能,但其训练、微调及推理过程对计算和内存资源需求巨大。尽管先进的量化技术已成功降低了推理阶段的内存占用,训练和微调这些量化模型仍需大量内存,这可能是由于为了精确计算梯度或基于梯度的算法进行反向传播而进行的去量化操作所致。然而,对于诸如个性化等应用场景,内存高效的微调尤为关键,因为这些应用往往需要在移动设备等边缘设备上运行,处理私人数据。本研究通过结合文本反演技术对扩散模型进行量化,并利用零阶优化方法对个性化令牌进行优化,避免了去量化过程,从而无需存储用于反向传播的梯度和激活值,显著减少了内存消耗。鉴于零阶优化在个性化任务中对单张或少量图像的梯度估计存在较大噪声,我们提出通过将估计梯度投影到由令牌历史构建的子空间上,即子空间梯度,来实现去噪。此外,我们探究了文本嵌入在图像生成中的影响,进而提出了部分均匀时间步采样方法,用于在有效的扩散时间步长上进行采样。我们的方法在仅需前向传播的情况下,为Stable Diffusion模型个性化任务实现了与现有方法相当的图像与文本对齐分数,同时将训练内存需求降低了高达8.2倍。
English
Diffusion models have shown remarkable performance in image synthesis, but
they demand extensive computational and memory resources for training,
fine-tuning and inference. Although advanced quantization techniques have
successfully minimized memory usage for inference, training and fine-tuning
these quantized models still require large memory possibly due to
dequantization for accurate computation of gradients and/or backpropagation for
gradient-based algorithms. However, memory-efficient fine-tuning is
particularly desirable for applications such as personalization that often must
be run on edge devices like mobile phones with private data. In this work, we
address this challenge by quantizing a diffusion model with personalization via
Textual Inversion and by leveraging a zeroth-order optimization on
personalization tokens without dequantization so that it does not require
gradient and activation storage for backpropagation that consumes considerable
memory. Since a gradient estimation using zeroth-order optimization is quite
noisy for a single or a few images in personalization, we propose to denoise
the estimated gradient by projecting it onto a subspace that is constructed
with the past history of the tokens, dubbed Subspace Gradient. In addition, we
investigated the influence of text embedding in image generation, leading to
our proposed time steps sampling, dubbed Partial Uniform Timestep Sampling for
sampling with effective diffusion timesteps. Our method achieves comparable
performance to prior methods in image and text alignment scores for
personalizing Stable Diffusion with only forward passes while reducing training
memory demand up to 8.2times.Summary
AI-Generated Summary