ChatPaper.aiChatPaper

EVTAR:基於非配對視覺參考的端到端虛擬試穿系統

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

November 2, 2025
作者: Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Dengyang Jiang, Zanyi Wang, Dawei Leng, Yuhui Yin
cs.AI

摘要

我們提出EVTAR——一種帶有附加參考的端到端虛擬試穿模型,該模型能直接將目標服裝擬合至人體圖像,同時通過引入參考圖像來提升試穿精準度。現有虛擬試穿方法大多依賴複雜輸入(如不可知人體圖像、人體姿態、密集姿態或身體關鍵點),導致流程繁瑣且難以應用於實際場景。與此不同,EVTAR採用兩階段訓練策略,僅需源圖像和目標服裝即可實現簡潔的推理過程。我們的模型無需遮罩、密集姿態或分割圖即可生成試穿效果,更通過引入不同穿著者身穿同款服裝的附加參考圖像,更好地保留服裝紋理與細粒度細節。這種機制模擬了人類挑選服裝時參考模特展示的決策過程,從而實現更逼真高質量的著裝效果。為支持這些功能,我們通過補充參考圖像和未配對人體圖像來擴充訓練數據。我們在兩個廣泛使用的基準測試和多樣化任務上評估EVTAR,結果一致驗證了本方法的有效性。
English
We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.
PDF42December 2, 2025