ChatPaper.aiChatPaper

DomainStudio:使用有限数据对领域驱动图像生成进行扩散模型微调

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

June 25, 2023
作者: Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan
cs.AI

摘要

去噪扩散概率模型(DDPMs)已被证明在大量数据训练时能够合成高质量且具有显著多样性的图像。典型的扩散模型和现代大规模条件生成模型,如文本到图像生成模型,在极其有限的数据上微调时容易出现过拟合问题。现有研究已经探索了使用包含少量图像的参考集进行主题驱动生成。然而,之前很少有研究探讨基于DDPM的面向领域的生成,旨在学习目标领域的共同特征同时保持多样性。本文提出了一种新领域工作室(DomainStudio)方法,通过有限数据将在大规模源数据集上预训练的DDPMs调整到目标领域。该方法旨在保持源领域提供的主题多样性,并在目标领域获得高质量和多样化的适应样本。我们建议保持适应样本之间的相对距离,以实现相当大的生成多样性。此外,我们进一步增强了对高频细节的学习,以提高生成质量。我们的方法适用于无条件和有条件的扩散模型。这项工作首次尝试使用扩散模型实现无条件的少样本图像生成,实现了比当前最先进的基于GAN的方法更好的质量和更大的多样性。此外,这项工作还显著减轻了有条件生成的过拟合问题,并实现了高质量的面向领域生成,进一步扩展了现代大规模文本到图像模型的适用场景。
English
Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.
PDF60December 15, 2024