EVTAR:基於非配對視覺參考的端到端虛擬試穿系統
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
November 2, 2025
作者: Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Dengyang Jiang, Zanyi Wang, Dawei Leng, Yuhui Yin
cs.AI
摘要
我們提出EVTAR——一種帶有附加參考的端到端虛擬試穿模型,該模型能直接將目標服裝擬合至人體圖像,同時通過引入參考圖像來提升試穿精準度。現有虛擬試穿方法大多依賴複雜輸入(如不可知人體圖像、人體姿態、密集姿態或身體關鍵點),導致流程繁瑣且難以應用於實際場景。與此不同,EVTAR採用兩階段訓練策略,僅需源圖像和目標服裝即可實現簡潔的推理過程。我們的模型無需遮罩、密集姿態或分割圖即可生成試穿效果,更通過引入不同穿著者身穿同款服裝的附加參考圖像,更好地保留服裝紋理與細粒度細節。這種機制模擬了人類挑選服裝時參考模特展示的決策過程,從而實現更逼真高質量的著裝效果。為支持這些功能,我們通過補充參考圖像和未配對人體圖像來擴充訓練數據。我們在兩個廣泛使用的基準測試和多樣化任務上評估EVTAR,結果一致驗證了本方法的有效性。
English
We propose EVTAR, an End-to-End Virtual Try-on model with Additional
Reference, that directly fits the target garment onto the person image while
incorporating reference images to enhance try-on accuracy. Most existing
virtual try-on approaches rely on complex inputs such as agnostic person
images, human pose, densepose, or body keypoints, making them labor-intensive
and impractical for real-world applications. In contrast, EVTAR adopts a
two-stage training strategy, enabling simple inference with only the source
image and the target garment inputs. Our model generates try-on results without
masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional
reference images of different individuals wearing the same clothes to preserve
garment texture and fine-grained details better. This mechanism is analogous to
how humans consider reference models when choosing outfits, thereby simulating
a more realistic and high-quality dressing effect. We enrich the training data
with supplementary references and unpaired person images to support these
capabilities. We evaluate EVTAR on two widely used benchmarks and diverse
tasks, and the results consistently validate the effectiveness of our approach.