ChatPaper.aiChatPaper

SaRA: 利用渐进稀疏低秩适应进行高效扩散模型微调

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

September 10, 2024
作者: Teng Hu, Jiangning Zhang, Ran Yi, Hongrui Huang, Yabiao Wang, Lizhuang Ma
cs.AI

摘要

近年来,扩散模型的发展在图像和视频生成任务中取得了显著进展,像稳定扩散系列这样的预训练模型发挥了关键作用。受模型修剪的启发,通过消除不重要的参数减轻大型预训练模型,我们提出了一种新颖的模型微调方法,充分利用这些无效参数,使预训练模型具备新的任务特定能力。在这项工作中,我们首先研究了预训练扩散模型中参数的重要性,发现绝对值最小的10%至20%的参数对生成过程没有贡献。基于这一观察,我们提出了一种名为SaRA的方法,重新利用这些暂时无效的参数,相当于优化稀疏权重矩阵以学习任务特定知识。为了减轻过拟合,我们提出了基于核范数的低秩稀疏训练方案进行高效微调。此外,我们设计了一种新的渐进参数调整策略,充分利用重新训练/微调的参数。最后,我们提出了一种新颖的非结构化反向传播策略,在微调过程中显著减少内存成本。我们的方法增强了预训练模型在下游应用中的生成能力,并在保持模型泛化能力方面优于LoRA等传统微调方法。通过对SD模型进行微调实验证实了我们的方法,展示了显著的改进。SaRA还提供了一个实际优势,只需修改一行代码即可高效实现,并且与现有方法完全兼容。
English
In recent years, the development of diffusion models has led to significant progress in image and video generation tasks, with pre-trained models like the Stable Diffusion series playing a crucial role. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters and enable the pre-trained model with new task-specified capabilities. In this work, we first investigate the importance of parameters in pre-trained diffusion models, and discover that the smallest 10% to 20% of parameters by absolute values do not contribute to the generation process. Based on this observation, we propose a method termed SaRA that re-utilizes these temporarily ineffective parameters, equating to optimizing a sparse weight matrix to learn the task-specific knowledge. To mitigate overfitting, we propose a nuclear-norm-based low-rank sparse training scheme for efficient fine-tuning. Furthermore, we design a new progressive parameter adjustment strategy to make full use of the re-trained/finetuned parameters. Finally, we propose a novel unstructural backpropagation strategy, which significantly reduces memory costs during fine-tuning. Our method enhances the generative capabilities of pre-trained models in downstream applications and outperforms traditional fine-tuning methods like LoRA in maintaining model's generalization ability. We validate our approach through fine-tuning experiments on SD models, demonstrating significant improvements. SaRA also offers a practical advantage that requires only a single line of code modification for efficient implementation and is seamlessly compatible with existing methods.

Summary

AI-Generated Summary

PDF152November 16, 2024