在任何场景中生成任何内容

摘要

文本到图像扩散模型因其在不同领域的广泛适用性而引起了极大关注。然而，在创建可控模型以个性化生成物体方面仍存在挑战。本文首先确定了现有个性化生成模型中的纠缠问题，然后提出了一种简单高效的数据增强训练策略，指导扩散模型专注于物体身份。通过插入来自预训练可控扩散模型的即插即用适配器层，我们的模型获得了控制每个生成的个性化物体位置和大小的能力。在推断过程中，我们提出了一种区域引导采样技术，以保持生成图像的质量和保真度。我们的方法实现了个性化物体的可比或更高保真度，产生了一个强大、多功能且可控的文本到图像扩散模型，能够生成逼真且个性化的图像。我们的方法展示了在艺术、娱乐和广告设计等各种应用中的重要潜力。

English

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields. However, challenges persist in creating controllable models for personalized object generation. In this paper, we first identify the entanglement issues in existing personalized generative models, and then propose a straightforward and efficient data augmentation training strategy that guides the diffusion model to focus solely on object identity. By inserting the plug-and-play adapter layers from a pre-trained controllable diffusion model, our model obtains the ability to control the location and size of each generated personalized object. During inference, we propose a regionally-guided sampling technique to maintain the quality and fidelity of the generated images. Our method achieves comparable or superior fidelity for personalized objects, yielding a robust, versatile, and controllable text-to-image diffusion model that is capable of generating realistic and personalized images. Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.

在任何场景中生成任何内容

Generate Anything Anywhere in Any Scene

摘要

Support