無需反向傳播的量化擴散模型高效個性化
Efficient Personalization of Quantized Diffusion Model without Backpropagation
March 19, 2025
作者: Hoigi Seo, Wongi Jeong, Kyungryeol Lee, Se Young Chun
cs.AI
摘要
擴散模型在圖像合成方面展現了卓越的性能,但其訓練、微調和推理過程需要消耗大量的計算和記憶體資源。儘管先進的量化技術已成功降低了推理時的記憶體使用,但訓練和微調這些量化模型仍需要大量記憶體,這可能是由於為了精確計算梯度而進行的反量化操作,以及基於梯度的算法中的反向傳播過程。然而,對於如個性化等應用場景,記憶體高效的微調尤為重要,因為這些應用通常需要在邊緣設備(如手機)上運行,並處理私人數據。在本研究中,我們通過量化結合Textual Inversion技術的擴散模型,並利用零階優化對個性化標記進行操作,避免了反量化,從而無需存儲用於反向傳播的梯度和激活值,大幅減少了記憶體消耗。考慮到零階優化在個性化場景下對單張或少數圖像的梯度估計存在較大噪聲,我們提出了一種降噪方法,即通過將估計的梯度投影到由標記歷史構建的子空間上,稱之為子空間梯度。此外,我們研究了文本嵌入在圖像生成中的影響,進而提出了部分均勻時間步採樣(Partial Uniform Timestep Sampling),用於在有效的擴散時間步上進行採樣。我們的方法在僅使用前向傳播的情況下,實現了與先前方法相當的圖像和文本對齊分數,同時將訓練記憶體需求降低了最多8.2倍。
English
Diffusion models have shown remarkable performance in image synthesis, but
they demand extensive computational and memory resources for training,
fine-tuning and inference. Although advanced quantization techniques have
successfully minimized memory usage for inference, training and fine-tuning
these quantized models still require large memory possibly due to
dequantization for accurate computation of gradients and/or backpropagation for
gradient-based algorithms. However, memory-efficient fine-tuning is
particularly desirable for applications such as personalization that often must
be run on edge devices like mobile phones with private data. In this work, we
address this challenge by quantizing a diffusion model with personalization via
Textual Inversion and by leveraging a zeroth-order optimization on
personalization tokens without dequantization so that it does not require
gradient and activation storage for backpropagation that consumes considerable
memory. Since a gradient estimation using zeroth-order optimization is quite
noisy for a single or a few images in personalization, we propose to denoise
the estimated gradient by projecting it onto a subspace that is constructed
with the past history of the tokens, dubbed Subspace Gradient. In addition, we
investigated the influence of text embedding in image generation, leading to
our proposed time steps sampling, dubbed Partial Uniform Timestep Sampling for
sampling with effective diffusion timesteps. Our method achieves comparable
performance to prior methods in image and text alignment scores for
personalizing Stable Diffusion with only forward passes while reducing training
memory demand up to 8.2times.Summary
AI-Generated Summary