DiffuseKronA:一种用于个性化扩散模型的参数高效微调方法
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
February 27, 2024
作者: Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen
cs.AI
摘要
在以主题驱动的文本到图像(T2I)生成模型领域,最近的发展如DreamBooth和BLIP-Diffusion取得了令人印象深刻的成果,但由于它们对精细调整的需求和大量参数的要求而遇到了限制。虽然DreamBooth内的低秩适应(LoRA)模块提供了可训练参数的减少,但引入了对超参数的显著敏感性,导致在参数效率和T2I个性化图像合成质量之间需要权衡。为了解决这些限制,我们引入了\textit{DiffuseKronA},这是一种基于Kronecker乘积的新型适应模块,不仅将参数数量分别比LoRA-DreamBooth和原始DreamBooth减少了35\%和99.947\%,而且提高了图像合成的质量。关键是,DiffuseKronA缓解了超参数敏感性问题,能够在各种超参数范围内提供一致高质量的生成结果,从而减少了对大量精细调整的必要性。此外,更可控的分解使DiffuseKronA更具可解释性,甚至可以实现高达50\%的减少,并且结果与LoRA-Dreambooth相媲美。通过对各种复杂输入图像和文本提示进行评估,DiffuseKronA始终优于现有模型,生成质量更高、保真度更高、对象颜色分布更准确的多样化图像,同时保持了出色的参数效率,从而在T2I生成建模领域取得了重大进展。我们的项目页面包括代码链接和预训练检查点,可在https://diffusekrona.github.io/(https://diffusekrona.github.io/)上找到。
English
In the realm of subject-driven text-to-image (T2I) generative models, recent
developments like DreamBooth and BLIP-Diffusion have led to impressive results
yet encounter limitations due to their intensive fine-tuning demands and
substantial parameter requirements. While the low-rank adaptation (LoRA) module
within DreamBooth offers a reduction in trainable parameters, it introduces a
pronounced sensitivity to hyperparameters, leading to a compromise between
parameter efficiency and the quality of T2I personalized image synthesis.
Addressing these constraints, we introduce \textit{DiffuseKronA}, a
novel Kronecker product-based adaptation module that not only significantly
reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth
and the original DreamBooth, respectively, but also enhances the quality of
image synthesis. Crucially, DiffuseKronA mitigates the issue of
hyperparameter sensitivity, delivering consistent high-quality generations
across a wide range of hyperparameters, thereby diminishing the necessity for
extensive fine-tuning. Furthermore, a more controllable decomposition makes
DiffuseKronA more interpretable and even can achieve up to a 50\%
reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse
and complex input images and text prompts, DiffuseKronA consistently
outperforms existing models, producing diverse images of higher quality with
improved fidelity and a more accurate color distribution of objects, all the
while upholding exceptional parameter efficiency, thus presenting a substantial
advancement in the field of T2I generative modeling. Our project page,
consisting of links to the code, and pre-trained checkpoints, is available at
https://diffusekrona.github.io/{https://diffusekrona.github.io/}.