ChatPaper.aiChatPaper

DomainStudio:使用有限數據對領域驅動的圖像生成進行微調擴散模型

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

June 25, 2023
作者: Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan
cs.AI

摘要

去噪擴散概率模型(DDPMs)已被證明在大量數據訓練時能夠合成高質量且具有卓越多樣性的圖像。典型的擴散模型和現代大規模條件生成模型,如文本到圖像生成模型,在極度有限的數據上微調時容易過度擬合。現有研究已經探索使用包含少量圖像的參考集進行主題驅動生成。然而,很少有先前的研究探索基於DDPM的面向領域的生成,其目的是學習目標領域的共同特徵並保持多樣性。本文提出了一種新穎的DomainStudio方法,將在大規模源數據集上預先訓練的DDPMs適應到使用有限數據的目標領域。它旨在保持源領域提供的主題多樣性,並在目標領域中獲得高質量和多樣化的適應樣本。我們建議保持適應樣本之間的相對距離,以實現相當大的生成多樣性。此外,我們進一步增強了對高頻細節的學習,以提高生成質量。我們的方法與無條件和有條件的擴散模型兼容。這項工作首次嘗試實現使用擴散模型進行無條件少樣本圖像生成,實現了比當前最先進的基於GAN方法更好的質量和更大的多樣性。此外,這項工作還顯著減輕了有條件生成的過度擬合,實現了高質量的面向領域生成,進一步擴展了現代大規模文本到圖像模型的應用場景。
English
Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.
PDF60December 15, 2024