在任何場景中生成任何東西
Generate Anything Anywhere in Any Scene
June 29, 2023
作者: Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee
cs.AI
摘要
基於其在不同領域的廣泛應用性,文本到圖像擴散模型引起了相當大的興趣。然而,在創建可控模型以進行個性化物體生成方面仍存在挑戰。本文首先識別現有個性化生成模型中的糾纏問題,然後提出了一種直接且高效的數據擴增訓練策略,引導擴散模型僅專注於物體身份。通過從預先訓練的可控擴散模型中插入即插即用的適配器層,我們的模型獲得了控制每個生成的個性化物體的位置和大小的能力。在推斷過程中,我們提出了一種區域引導的取樣技術,以保持生成圖像的質量和保真度。我們的方法實現了個性化物體的可比或更高的保真度,產生出一個堅固、多功能且可控的文本到圖像擴散模型,能夠生成逼真且個性化的圖像。我們的方法展示了在藝術、娛樂和廣告設計等各種應用中的重要潛力。
English
Text-to-image diffusion models have attracted considerable interest due to
their wide applicability across diverse fields. However, challenges persist in
creating controllable models for personalized object generation. In this paper,
we first identify the entanglement issues in existing personalized generative
models, and then propose a straightforward and efficient data augmentation
training strategy that guides the diffusion model to focus solely on object
identity. By inserting the plug-and-play adapter layers from a pre-trained
controllable diffusion model, our model obtains the ability to control the
location and size of each generated personalized object. During inference, we
propose a regionally-guided sampling technique to maintain the quality and
fidelity of the generated images. Our method achieves comparable or superior
fidelity for personalized objects, yielding a robust, versatile, and
controllable text-to-image diffusion model that is capable of generating
realistic and personalized images. Our approach demonstrates significant
potential for various applications, such as those in art, entertainment, and
advertising design.