TryOnDiffusion:兩個UNet的故事
TryOnDiffusion: A Tale of Two UNets
June 14, 2023
作者: Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman
cs.AI
摘要
給定兩幅圖像,一幅描繪一人,另一幅描繪另一人穿著的服裝,我們的目標是生成一個視覺化,展示服裝在輸入人物身上的可能外觀。一個關鍵挑戰是在變形服裝以適應主題之間的重大身體姿勢和形狀變化的同時,合成一個保留真實細節的視覺化。先前的方法要麼專注於保留服裝細節而沒有效處理姿勢和形狀變化,要麼允許試穿具有所需形狀和姿勢,但缺乏服裝細節。在本文中,我們提出了一種基於擴散的架構,統一了兩個 UNet(稱為平行 UNet),這使我們能夠在單個網絡中保留服裝細節並變形服裝以應對重大的姿勢和身體變化。平行 UNet 的關鍵思想包括:1)通過交叉注意機制隱式變形服裝,2)服裝變形和人物融合作為統一過程的一部分,而不是兩個獨立任務的序列。實驗結果表明,TryOnDiffusion 在質量和量化方面均實現了最先進的性能。
English
Given two images depicting a person and a garment worn by another person, our
goal is to generate a visualization of how the garment might look on the input
person. A key challenge is to synthesize a photorealistic detail-preserving
visualization of the garment, while warping the garment to accommodate a
significant body pose and shape change across the subjects. Previous methods
either focus on garment detail preservation without effective pose and shape
variation, or allow try-on with the desired shape and pose but lack garment
details. In this paper, we propose a diffusion-based architecture that unifies
two UNets (referred to as Parallel-UNet), which allows us to preserve garment
details and warp the garment for significant pose and body change in a single
network. The key ideas behind Parallel-UNet include: 1) garment is warped
implicitly via a cross attention mechanism, 2) garment warp and person blend
happen as part of a unified process as opposed to a sequence of two separate
tasks. Experimental results indicate that TryOnDiffusion achieves
state-of-the-art performance both qualitatively and quantitatively.