DiffuseKronA:一種用於個性化擴散模型的參數高效微調方法
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
February 27, 2024
作者: Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen
cs.AI
摘要
在以主題驅動的文本到圖像(T2I)生成模型領域中,像DreamBooth和BLIP-Diffusion這樣的最新發展取得了令人印象深刻的成果,但由於它們對精細調整的需求和大量參數要求而遇到了限制。雖然DreamBooth內的低秩適應(LoRA)模塊提供了可訓練參數的減少,但它引入了對超參數的明顯敏感性,導致在參數效率和T2I個性化圖像合成質量之間需要取得折衷。
為了應對這些限制,我們引入了\textit{DiffuseKronA},一個基於Kronecker乘積的適應模塊,不僅將參數數量分別降低了35%和99.947%,與LoRA-DreamBooth和原始DreamBooth相比,還增強了圖像合成的質量。重要的是,DiffuseKronA緩解了超參數敏感性問題,通過各種超參數範圍提供了一致高質量的生成,從而減少了對於大量精細調整的必要性。此外,更可控的分解使DiffuseKronA更具可解釋性,甚至可以實現與LoRA-Dreambooth相當的結果,同時減少高達50%的參數,。在各種複雜的輸入圖像和文本提示的評估中,DiffuseKronA始終優於現有模型,生成質量更高、對象的色彩分佈更準確的多樣化圖像,同時保持卓越的參數效率,從而在T2I生成建模領域取得了重大進展。我們的項目頁面包含代碼和預先訓練的檢查點的鏈接,可在https://diffusekrona.github.io/ 上找到。
English
In the realm of subject-driven text-to-image (T2I) generative models, recent
developments like DreamBooth and BLIP-Diffusion have led to impressive results
yet encounter limitations due to their intensive fine-tuning demands and
substantial parameter requirements. While the low-rank adaptation (LoRA) module
within DreamBooth offers a reduction in trainable parameters, it introduces a
pronounced sensitivity to hyperparameters, leading to a compromise between
parameter efficiency and the quality of T2I personalized image synthesis.
Addressing these constraints, we introduce \textit{DiffuseKronA}, a
novel Kronecker product-based adaptation module that not only significantly
reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth
and the original DreamBooth, respectively, but also enhances the quality of
image synthesis. Crucially, DiffuseKronA mitigates the issue of
hyperparameter sensitivity, delivering consistent high-quality generations
across a wide range of hyperparameters, thereby diminishing the necessity for
extensive fine-tuning. Furthermore, a more controllable decomposition makes
DiffuseKronA more interpretable and even can achieve up to a 50\%
reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse
and complex input images and text prompts, DiffuseKronA consistently
outperforms existing models, producing diverse images of higher quality with
improved fidelity and a more accurate color distribution of objects, all the
while upholding exceptional parameter efficiency, thus presenting a substantial
advancement in the field of T2I generative modeling. Our project page,
consisting of links to the code, and pre-trained checkpoints, is available at
https://diffusekrona.github.io/{https://diffusekrona.github.io/}.