逆向虚拟试衣:从着装个体生成多品类产品风格图像
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
May 27, 2025
作者: Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe
cs.AI
摘要
尽管虚拟试穿(VTON)系统致力于将服装渲染至目标人物图像上,本文则聚焦于一项新颖任务——虚拟脱衣(VTOFF),它解决的是逆向问题:从穿着服装的真实人物照片中生成标准化的服装产品图像。与VTON需应对多样的姿态和风格变化不同,VTOFF得益于一致且定义明确的输出格式——通常是服装的平铺展示形式,这使其成为数据生成与数据集增强的有力工具。然而,现有VTOFF方法面临两大局限:(i)难以从遮挡和复杂姿态中分离服装特征,常导致视觉伪影;(ii)仅适用于单一类别服装(如上衣),限制了其泛化能力。为应对这些挑战,我们提出了文本增强多类别虚拟脱衣(TEMU-VTOFF),一种创新架构,采用双DiT基干网络并改进多模态注意力机制,以实现稳健的服装特征提取。该架构设计为接收来自图像、文本及掩码等多模态的服装信息,以适应多类别场景。此外,我们引入了一个额外的对齐模块,以进一步优化生成的视觉细节。在VITON-HD和Dress Code数据集上的实验表明,TEMU-VTOFF在VTOFF任务上树立了新的技术标杆,显著提升了视觉质量及对目标服装的忠实度。
English
While virtual try-on (VTON) systems aim to render a garment onto a target
person image, this paper tackles the novel task of virtual try-off (VTOFF),
which addresses the inverse problem: generating standardized product images of
garments from real-world photos of clothed individuals. Unlike VTON, which must
resolve diverse pose and style variations, VTOFF benefits from a consistent and
well-defined output format -- typically a flat, lay-down-style representation
of the garment -- making it a promising tool for data generation and dataset
enhancement. However, existing VTOFF approaches face two major limitations: (i)
difficulty in disentangling garment features from occlusions and complex poses,
often leading to visual artifacts, and (ii) restricted applicability to
single-category garments (e.g., upper-body clothes only), limiting
generalization. To address these challenges, we present Text-Enhanced
MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a
dual DiT-based backbone with a modified multimodal attention mechanism for
robust garment feature extraction. Our architecture is designed to receive
garment information from multiple modalities like images, text, and masks to
work in a multi-category setting. Finally, we propose an additional alignment
module to further refine the generated visual details. Experiments on VITON-HD
and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the
VTOFF task, significantly improving both visual quality and fidelity to the
target garments.Summary
AI-Generated Summary