CustomNet：在文本到图像扩散模型中实现零-shot 对象定制与可变视角

摘要

将定制对象纳入图像生成中在文本到图像生成中具有吸引人的特点。然而，现有基于优化和编码器的方法受到诸如耗时的优化、不足的身份保留以及普遍存在的复制粘贴效应等缺点的阻碍。为了克服这些限制，我们引入了CustomNet，这是一种新颖的对象定制方法，明确地将3D新视图合成能力融入到对象定制过程中。这种集成有助于调整空间位置关系和视角，产生多样化的输出，同时有效地保留对象身份。此外，我们引入精心设计，通过文本描述或特定用户定义的图像实现位置控制和灵活的背景控制，克服了现有3D新视图合成方法的局限性。我们进一步利用数据集构建流程，更好地处理现实世界的对象和复杂背景。凭借这些设计，我们的方法实现了零-shot对象定制，无需测试时间优化，同时实现了对视角、位置和背景的同时控制。因此，我们的CustomNet确保了增强的身份保留，并生成多样化、和谐的输出。

English

Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.

CustomNet：在文本到图像扩散模型中实现零-shot 对象定制与可变视角

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

摘要

Support