EVTAR：基于额外非配对视觉参考的端到端虚拟试穿系统

摘要

我们提出EVTAR（基于附加参考的端到端虚拟试穿模型），该模型可直接将目标服装贴合至人像，并通过引入参考图像提升试穿精度。现有虚拟试穿方法大多依赖复杂输入，如不可知人像、人体姿态、密集姿态或身体关键点，导致实施繁琐且难以实际应用。相比之下，EVTAR采用两阶段训练策略，仅需源图像和目标服装即可完成简单推理。我们的模型无需掩码、密集姿态或分割图即可生成试穿效果。此外，EVTAR通过引入不同穿着者的同款服装参考图像，更好地保留服装纹理与细粒度细节。这种机制模拟了人类挑选服装时参考模特展示的思维方式，从而实现更逼真高质量的着装效果。我们通过补充参考图像和非配对人物图像来增强训练数据以支持这些功能。在两大主流基准测试及多样化任务上的实验结果表明，我们的方法持续展现出卓越有效性。

English

We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.

EVTAR：基于额外非配对视觉参考的端到端虚拟试穿系统

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

摘要

Support