ChatPaper.aiChatPaper

CatV2TON:利用時間串接技術對視覺虛擬試穿進行擴散Transformer的約束

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

January 20, 2025
作者: Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang
cs.AI

摘要

虛擬試穿(VTON)技術因其具備改變線上零售的潛力,使圖像和影片中的服裝可逼真地展示而受到關注。然而,大多數現有方法在圖像和影片試穿任務中難以取得高質量的結果,尤其是在長影片情境中。在這項工作中,我們介紹了CatV2TON,一種簡單而有效的基於視覺的虛擬試穿(V2TON)方法,它支持圖像和影片試穿任務,並使用單一擴散變壓器模型。通過時間上串聯服裝和人物輸入,並在圖像和影片數據集的混合訓練下,CatV2TON實現了在靜態和動態環境中的穩健試穿表現。為了實現高效的長影片生成,我們提出了一種基於重疊片段的推理策略,該策略使用連續幀引導和自適應片段歸一化(AdaCN)來保持時間一致性並降低資源需求。我們還提出了ViViD-S,一個經過精心處理的影片試穿數據集,通過過濾背對幀並應用3D遮罩平滑來增強時間一致性。全面的實驗表明,CatV2TON在圖像和影片試穿任務中優於現有方法,為實現逼真虛擬試穿在各種情境下提供了多功能且可靠的解決方案。
English
Virtual try-on (VTON) technology has gained attention due to its potential to transform online retail by enabling realistic clothing visualization of images and videos. However, most existing methods struggle to achieve high-quality results across image and video try-on tasks, especially in long video scenarios. In this work, we introduce CatV2TON, a simple and effective vision-based virtual try-on (V2TON) method that supports both image and video try-on tasks with a single diffusion transformer model. By temporally concatenating garment and person inputs and training on a mix of image and video datasets, CatV2TON achieves robust try-on performance across static and dynamic settings. For efficient long-video generation, we propose an overlapping clip-based inference strategy that uses sequential frame guidance and Adaptive Clip Normalization (AdaCN) to maintain temporal consistency with reduced resource demands. We also present ViViD-S, a refined video try-on dataset, achieved by filtering back-facing frames and applying 3D mask smoothing for enhanced temporal consistency. Comprehensive experiments demonstrate that CatV2TON outperforms existing methods in both image and video try-on tasks, offering a versatile and reliable solution for realistic virtual try-ons across diverse scenarios.

Summary

AI-Generated Summary

PDF53January 27, 2025