TryOnDiffusion：两个UNets的故事

摘要

给定两幅图像，一幅展示一个人，另一幅展示另一个人穿着的服装，我们的目标是生成一幅可视化图像，展示服装在输入人物身上的效果。一个关键挑战是在调整服装以适应主体之间的显著身体姿势和形状变化的同时，合成保留细节的逼真可视化效果。先前的方法要么侧重于保留服装细节而忽视有效的姿势和形状变化，要么允许试穿所需形状和姿势，但缺乏服装细节。在本文中，我们提出了一种基于扩散的架构，将两个UNet（称为Parallel-UNet）统一起来，这使我们能够在一个网络中保留服装细节并调整服装以适应显著的姿势和身体变化。Parallel-UNet的关键思想包括：1）通过交叉注意力机制隐式调整服装，2）服装调整和人物融合作为一个统一过程而不是两个独立任务的序列。实验结果表明，TryOnDiffusion在定性和定量上均实现了最先进的性能。

English

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

TryOnDiffusion：两个UNets的故事

TryOnDiffusion: A Tale of Two UNets

摘要

Support