DiffuseKronA: 개인화된 디퓨전 모델을 위한 파라미터 효율적 미세 조정 기법

초록

주제 기반 텍스트-이미지(T2I) 생성 모델 분야에서, DreamBooth와 BLIP-Diffusion과 같은 최근의 발전은 인상적인 결과를 도출했지만, 집중적인 미세 조정 요구 사항과 상당한 매개변수 요구로 인해 한계에 직면해 있습니다. DreamBooth 내의 저순위 적응(LoRA) 모듈은 학습 가능한 매개변수를 줄여주지만, 하이퍼파라미터에 대한 뚜렷한 민감도를 초래하여 매개변수 효율성과 T2I 개인화 이미지 합성 품질 사이의 타협을 가져옵니다. 이러한 제약을 해결하기 위해, 우리는 \textit{DiffuseKronA}를 소개합니다. 이는 크로네커 곱 기반의 새로운 적응 모듈로, LoRA-DreamBooth와 원본 DreamBooth에 비해 각각 35%와 99.947%의 매개변수 감소를 달성할 뿐만 아니라 이미지 합성 품질도 향상시킵니다. 특히, DiffuseKronA는 하이퍼파라미터 민감도 문제를 완화하여 광범위한 하이퍼파라미터 범위에서 일관된 고품질 생성을 제공함으로써, 광범위한 미세 조정의 필요성을 줄입니다. 더 나아가, 더 제어 가능한 분해는 DiffuseKronA를 더 해석 가능하게 만들며, LoRA-DreamBooth와 비슷한 결과를 내면서도 최대 50%의 감소를 달성할 수 있습니다. 다양한 복잡한 입력 이미지와 텍스트 프롬프트에 대해 평가한 결과, DiffuseKronA는 기존 모델을 지속적으로 능가하며, 개선된 충실도와 더 정확한 객체 색상 분포를 가진 더 높은 품질의 다양한 이미지를 생성하면서도 탁월한 매개변수 효율성을 유지합니다. 이는 T2I 생성 모델링 분야에서 상당한 진전을 나타냅니다. 우리의 프로젝트 페이지는 코드와 사전 훈련된 체크포인트에 대한 링크를 포함하며, https://diffusekrona.github.io/{https://diffusekrona.github.io/}에서 확인할 수 있습니다.

English

In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textit{DiffuseKronA}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, DiffuseKronA mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes DiffuseKronA more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, DiffuseKronA consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at https://diffusekrona.github.io/{https://diffusekrona.github.io/}.

DiffuseKronA: 개인화된 디퓨전 모델을 위한 파라미터 효율적 미세 조정 기법

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

초록

Support