DomainStudio: 限られたデータを用いたドメイン駆動型画像生成のための拡散モデルのファインチューニング

要旨

Denoising Diffusion Probabilistic Models（DDPMs）は、大量のデータで学習した場合に、高い品質と顕著な多様性を備えた画像を合成できることが証明されています。典型的な拡散モデルや、テキストから画像を生成する現代の大規模条件付き生成モデルは、極めて限られたデータでファインチューニングを行うと過学習に陥りやすいという課題があります。既存の研究では、少数の画像を含む参照セットを使用した被写体駆動型生成が探求されてきました。しかし、DDPMを基盤としたドメイン駆動型生成、つまりターゲットドメインの共通特徴を学習しつつ多様性を維持することを目指す研究はほとんどありません。本論文では、大規模なソースデータセットで事前学習されたDDPMを、限られたデータを使用してターゲットドメインに適応させるための新しいアプローチ「DomainStudio」を提案します。このアプローチは、ソースドメインが提供する被写体の多様性を維持しつつ、ターゲットドメインにおいて高品質で多様な適応サンプルを生成することを目的としています。適応サンプル間の相対的な距離を維持することで、生成の多様性を大幅に向上させることを提案します。さらに、高周波の詳細をより良く学習するために、高周波詳細の学習を強化します。本アプローチは、無条件および条件付きの拡散モデルの両方と互換性があります。本研究は、拡散モデルを用いた無条件のFew-shot画像生成を初めて実現し、現在の最先端のGANベースのアプローチよりも優れた品質と多様性を達成しました。さらに、条件付き生成における過学習を大幅に軽減し、高品質なドメイン駆動型生成を実現することで、現代の大規模テキストから画像モデルの適用可能なシナリオをさらに拡大しました。

English

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.

DomainStudio: 限られたデータを用いたドメイン駆動型画像生成のための拡散モデルのファインチューニング

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

要旨

Support