ChatPaper.aiChatPaper

逆向虚拟试衣:从着装个体生成多品类产品风格图像

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

May 27, 2025
作者: Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe
cs.AI

摘要

尽管虚拟试穿(VTON)系统致力于将服装渲染至目标人物图像上,本文则聚焦于一项新颖任务——虚拟脱衣(VTOFF),它解决的是逆向问题:从穿着服装的真实人物照片中生成标准化的服装产品图像。与VTON需应对多样的姿态和风格变化不同,VTOFF得益于一致且定义明确的输出格式——通常是服装的平铺展示形式,这使其成为数据生成与数据集增强的有力工具。然而,现有VTOFF方法面临两大局限:(i)难以从遮挡和复杂姿态中分离服装特征,常导致视觉伪影;(ii)仅适用于单一类别服装(如上衣),限制了其泛化能力。为应对这些挑战,我们提出了文本增强多类别虚拟脱衣(TEMU-VTOFF),一种创新架构,采用双DiT基干网络并改进多模态注意力机制,以实现稳健的服装特征提取。该架构设计为接收来自图像、文本及掩码等多模态的服装信息,以适应多类别场景。此外,我们引入了一个额外的对齐模块,以进一步优化生成的视觉细节。在VITON-HD和Dress Code数据集上的实验表明,TEMU-VTOFF在VTOFF任务上树立了新的技术标杆,显著提升了视觉质量及对目标服装的忠实度。
English
While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals. Unlike VTON, which must resolve diverse pose and style variations, VTOFF benefits from a consistent and well-defined output format -- typically a flat, lay-down-style representation of the garment -- making it a promising tool for data generation and dataset enhancement. However, existing VTOFF approaches face two major limitations: (i) difficulty in disentangling garment features from occlusions and complex poses, often leading to visual artifacts, and (ii) restricted applicability to single-category garments (e.g., upper-body clothes only), limiting generalization. To address these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone with a modified multimodal attention mechanism for robust garment feature extraction. Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting. Finally, we propose an additional alignment module to further refine the generated visual details. Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task, significantly improving both visual quality and fidelity to the target garments.

Summary

AI-Generated Summary

PDF31May 28, 2025