主題擴散:無需測試時間微調的開放領域個性化文本到圖像生成
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
July 21, 2023
作者: Jian Ma, Junhao Liang, Chen Chen, Haonan Lu
cs.AI
摘要
最近在使用擴散模型進行個性化圖像生成方面取得了顯著進展。然而,在開放領域和非微調個性化圖像生成領域的發展進展相對緩慢。在本文中,我們提出了Subject-Diffusion,一種新穎的開放領域個性化圖像生成模型,除了不需要測試時微調外,還只需要一張參考圖像來支持在任何領域中生成單一或多個主題的個性化。首先,我們構建了一個自動數據標記工具,並使用LAION-Aesthetics數據集構建了一個包含7600萬圖像及其相應主題檢測邊界框、分割遮罩和文本描述的大規模數據集。其次,我們設計了一個新的統一框架,通過結合文本和圖像語義,並納入粗略位置和細粒度參考圖像控制,以最大程度地提高主題的忠實度和泛化性。此外,我們還採用了一種注意力控制機制來支持多主題生成。廣泛的定性和定量結果表明,我們的方法在單一、多個和人類定制圖像生成方面優於其他最先進的框架。請參閱我們的項目頁面:https://oppo-mente-lab.github.io/subject_diffusion/
English
Recent progress in personalized image generation using diffusion models has
been significant. However, development in the area of open-domain and
non-fine-tuning personalized image generation is proceeding rather slowly. In
this paper, we propose Subject-Diffusion, a novel open-domain personalized
image generation model that, in addition to not requiring test-time
fine-tuning, also only requires a single reference image to support
personalized generation of single- or multi-subject in any domain. Firstly, we
construct an automatic data labeling tool and use the LAION-Aesthetics dataset
to construct a large-scale dataset consisting of 76M images and their
corresponding subject detection bounding boxes, segmentation masks and text
descriptions. Secondly, we design a new unified framework that combines text
and image semantics by incorporating coarse location and fine-grained reference
image control to maximize subject fidelity and generalization. Furthermore, we
also adopt an attention control mechanism to support multi-subject generation.
Extensive qualitative and quantitative results demonstrate that our method
outperforms other SOTA frameworks in single, multiple, and human customized
image generation. Please refer to our
https://oppo-mente-lab.github.io/subject_diffusion/{project page}