ChatPaper.aiChatPaper

Voost:面向双向虚拟试穿与试脱的统一可扩展扩散变换器

Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

August 6, 2025
作者: Seungyong Lee, Jeong-gi Kwak
cs.AI

摘要

虛擬試穿技術旨在合成人物穿著目標服裝的真實圖像,然而精確建模服裝與人體之間的對應關係仍是一大挑戰,尤其是在姿態和外觀變化的情況下。本文提出Voost——一個統一且可擴展的框架,通過單一擴散變壓器聯合學習虛擬試穿與試脫。通過共同建模這兩項任務,Voost使得每對服裝與人體能夠相互監督,並支持對生成方向及服裝類別的靈活條件控制,從而增強了服裝與人體關係的推理能力,無需特定任務網絡、輔助損失或額外標籤。此外,我們引入了兩種推理時技術:注意力溫度縮放以增強對分辨率或遮罩變化的魯棒性,以及自我校正採樣,該技術利用任務間的雙向一致性。大量實驗表明,Voost在試穿與試脫基準測試中均達到了最先進的成果,在對齊精度、視覺逼真度及泛化能力上持續超越強勁基線。
English
Virtual try-on aims to synthesize a realistic image of a person wearing a target garment, but accurately modeling garment-body correspondence remains a persistent challenge, especially under pose and appearance variation. In this paper, we propose Voost - a unified and scalable framework that jointly learns virtual try-on and try-off with a single diffusion transformer. By modeling both tasks jointly, Voost enables each garment-person pair to supervise both directions and supports flexible conditioning over generation direction and garment category, enhancing garment-body relational reasoning without task-specific networks, auxiliary losses, or additional labels. In addition, we introduce two inference-time techniques: attention temperature scaling for robustness to resolution or mask variation, and self-corrective sampling that leverages bidirectional consistency between tasks. Extensive experiments demonstrate that Voost achieves state-of-the-art results on both try-on and try-off benchmarks, consistently outperforming strong baselines in alignment accuracy, visual fidelity, and generalization.
PDF493August 11, 2025