EVTAR:基于额外非配对视觉参考的端到端虚拟试穿系统
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
November 2, 2025
作者: Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Dengyang Jiang, Zanyi Wang, Dawei Leng, Yuhui Yin
cs.AI
摘要
我们提出EVTAR(基于附加参考的端到端虚拟试穿模型),该模型可直接将目标服装贴合至人像,并通过引入参考图像提升试穿精度。现有虚拟试穿方法大多依赖复杂输入,如不可知人像、人体姿态、密集姿态或身体关键点,导致实施繁琐且难以实际应用。相比之下,EVTAR采用两阶段训练策略,仅需源图像和目标服装即可完成简单推理。我们的模型无需掩码、密集姿态或分割图即可生成试穿效果。此外,EVTAR通过引入不同穿着者的同款服装参考图像,更好地保留服装纹理与细粒度细节。这种机制模拟了人类挑选服装时参考模特展示的思维方式,从而实现更逼真高质量的着装效果。我们通过补充参考图像和非配对人物图像来增强训练数据以支持这些功能。在两大主流基准测试及多样化任务上的实验结果表明,我们的方法持续展现出卓越有效性。
English
We propose EVTAR, an End-to-End Virtual Try-on model with Additional
Reference, that directly fits the target garment onto the person image while
incorporating reference images to enhance try-on accuracy. Most existing
virtual try-on approaches rely on complex inputs such as agnostic person
images, human pose, densepose, or body keypoints, making them labor-intensive
and impractical for real-world applications. In contrast, EVTAR adopts a
two-stage training strategy, enabling simple inference with only the source
image and the target garment inputs. Our model generates try-on results without
masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional
reference images of different individuals wearing the same clothes to preserve
garment texture and fine-grained details better. This mechanism is analogous to
how humans consider reference models when choosing outfits, thereby simulating
a more realistic and high-quality dressing effect. We enrich the training data
with supplementary references and unpaired person images to support these
capabilities. We evaluate EVTAR on two widely used benchmarks and diverse
tasks, and the results consistently validate the effectiveness of our approach.