ChatPaper.aiChatPaper

逆向虛擬試穿:從著裝個體生成多類別產品風格圖像

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

May 27, 2025
作者: Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe
cs.AI

摘要

雖然虛擬試穿(VTON)系統旨在將服裝渲染到目標人物圖像上,本文則探討了一項新穎的任務——虛擬脫衣(VTOFF),該任務解決的是相反的問題:從穿著服裝的個人的真實照片中生成標準化的服裝產品圖像。與VTON需要處理多樣的姿勢和風格變化不同,VTOFF受益於一致且定義明確的輸出格式——通常是服裝的平鋪展示形式——這使其成為數據生成和數據集增強的有力工具。然而,現有的VTOFF方法面臨兩大限制:(i) 難以從遮擋和複雜姿勢中分離出服裝特徵,常導致視覺偽影;(ii) 僅適用於單一類別服裝(例如僅限於上半身衣物),限制了其泛化能力。為應對這些挑戰,我們提出了文本增強的多類別虛擬脫衣(TEMU-VTOFF),這是一種新穎的架構,採用雙DiT基幹網絡,並配備了改進的多模態注意力機制,以實現穩健的服裝特徵提取。我們的架構設計為能夠接收來自圖像、文本和掩碼等多種模態的服裝信息,以適應多類別場景。最後,我們提出了一個額外的對齊模塊,以進一步精煉生成的視覺細節。在VITON-HD和Dress Code數據集上的實驗表明,TEMU-VTOFF在VTOFF任務上設定了新的技術標準,顯著提升了視覺質量和對目標服裝的忠實度。
English
While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals. Unlike VTON, which must resolve diverse pose and style variations, VTOFF benefits from a consistent and well-defined output format -- typically a flat, lay-down-style representation of the garment -- making it a promising tool for data generation and dataset enhancement. However, existing VTOFF approaches face two major limitations: (i) difficulty in disentangling garment features from occlusions and complex poses, often leading to visual artifacts, and (ii) restricted applicability to single-category garments (e.g., upper-body clothes only), limiting generalization. To address these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone with a modified multimodal attention mechanism for robust garment feature extraction. Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting. Finally, we propose an additional alignment module to further refine the generated visual details. Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task, significantly improving both visual quality and fidelity to the target garments.

Summary

AI-Generated Summary

PDF31May 28, 2025