CustomNet:在文本到图像扩散模型中实现零-shot 对象定制与可变视角
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
October 30, 2023
作者: Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan
cs.AI
摘要
将定制对象纳入图像生成中在文本到图像生成中具有吸引人的特点。然而,现有基于优化和编码器的方法受到诸如耗时的优化、不足的身份保留以及普遍存在的复制粘贴效应等缺点的阻碍。为了克服这些限制,我们引入了CustomNet,这是一种新颖的对象定制方法,明确地将3D新视图合成能力融入到对象定制过程中。这种集成有助于调整空间位置关系和视角,产生多样化的输出,同时有效地保留对象身份。此外,我们引入精心设计,通过文本描述或特定用户定义的图像实现位置控制和灵活的背景控制,克服了现有3D新视图合成方法的局限性。我们进一步利用数据集构建流程,更好地处理现实世界的对象和复杂背景。凭借这些设计,我们的方法实现了零-shot对象定制,无需测试时间优化,同时实现了对视角、位置和背景的同时控制。因此,我们的CustomNet确保了增强的身份保留,并生成多样化、和谐的输出。
English
Incorporating a customized object into image generation presents an
attractive feature in text-to-image generation. However, existing
optimization-based and encoder-based methods are hindered by drawbacks such as
time-consuming optimization, insufficient identity preservation, and a
prevalent copy-pasting effect. To overcome these limitations, we introduce
CustomNet, a novel object customization approach that explicitly incorporates
3D novel view synthesis capabilities into the object customization process.
This integration facilitates the adjustment of spatial position relationships
and viewpoints, yielding diverse outputs while effectively preserving object
identity. Moreover, we introduce delicate designs to enable location control
and flexible background control through textual descriptions or specific
user-defined images, overcoming the limitations of existing 3D novel view
synthesis methods. We further leverage a dataset construction pipeline that can
better handle real-world objects and complex backgrounds. Equipped with these
designs, our method facilitates zero-shot object customization without
test-time optimization, offering simultaneous control over the viewpoints,
location, and background. As a result, our CustomNet ensures enhanced identity
preservation and generates diverse, harmonious outputs.