CustomNet:在文本到圖像擴散模型中實現零樣本物體定制與可變視角
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
October 30, 2023
作者: Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan
cs.AI
摘要
將自定義物件納入影像生成中,在文本到影像生成中呈現一個吸引人的特點。然而,現有基於優化和編碼器的方法受到一些缺點的阻礙,如耗時的優化、不足的身份保留以及普遍存在的複製-粘貼效應。為了克服這些限制,我們引入了CustomNet,一種新穎的物件自定義方法,明確將3D新視角合成能力融入物件自定義過程中。這種整合有助於調整空間位置關係和觀點,產生多樣的輸出,同時有效地保留物件身份。此外,我們引入精心設計,通過文本描述或特定用戶定義的圖像實現位置控制和靈活的背景控制,克服現有3D新視角合成方法的限制。我們進一步利用數據集構建流程,更好地處理現實世界的物件和複雜背景。憑藉這些設計,我們的方法實現了零樣本物件自定義,無需測試時間優化,同時實現對觀點、位置和背景的同時控制。因此,我們的CustomNet確保了增強的身份保留並生成多樣、和諧的輸出。
English
Incorporating a customized object into image generation presents an
attractive feature in text-to-image generation. However, existing
optimization-based and encoder-based methods are hindered by drawbacks such as
time-consuming optimization, insufficient identity preservation, and a
prevalent copy-pasting effect. To overcome these limitations, we introduce
CustomNet, a novel object customization approach that explicitly incorporates
3D novel view synthesis capabilities into the object customization process.
This integration facilitates the adjustment of spatial position relationships
and viewpoints, yielding diverse outputs while effectively preserving object
identity. Moreover, we introduce delicate designs to enable location control
and flexible background control through textual descriptions or specific
user-defined images, overcoming the limitations of existing 3D novel view
synthesis methods. We further leverage a dataset construction pipeline that can
better handle real-world objects and complex backgrounds. Equipped with these
designs, our method facilitates zero-shot object customization without
test-time optimization, offering simultaneous control over the viewpoints,
location, and background. As a result, our CustomNet ensures enhanced identity
preservation and generates diverse, harmonious outputs.