DreamDistribution: テキストから画像への拡散モデルにおけるプロンプト分布学習

要旨

テキストから画像（T2I）生成のための拡散モデルの普及により、テキスト記述から高品質な画像を生成することが可能になりました。しかし、参照となる視覚的属性を備えた多様なカスタマイズ画像を生成することは依然として課題です。本研究では、T2I拡散モデルをより抽象的な概念やカテゴリレベルでパーソナライズすることに焦点を当て、一連の参照画像から共通性を適応させつつ、十分なバリエーションを持つ新しいインスタンスを生成します。我々は、事前学習済みのT2I拡散モデルが一連のソフトプロンプトを学習し、学習された分布からプロンプトをサンプリングすることで新しい画像を生成できるソリューションを提案します。これらのプロンプトは、テキストガイドによる編集機能を提供し、複数の分布間でのバリエーションや混合を制御するための追加の柔軟性を提供します。また、学習されたプロンプト分布がテキストから3D生成などの他のタスクにも適応可能であることを示します。最後に、自動評価と人間による評価を含む定量的分析を通じて、我々のアプローチの有効性を実証します。プロジェクトウェブサイト: https://briannlongzhao.github.io/DreamDistribution

English

The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distribution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including automatic evaluation and human assessment. Project website: https://briannlongzhao.github.io/DreamDistribution

DreamDistribution: テキストから画像への拡散モデルにおけるプロンプト分布学習

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

要旨

Support