逆向虚拟试衣：从着装个体生成多品类产品风格图像

摘要

尽管虚拟试穿（VTON）系统致力于将服装渲染至目标人物图像上，本文则聚焦于一项新颖任务——虚拟脱衣（VTOFF），它解决的是逆向问题：从穿着服装的真实人物照片中生成标准化的服装产品图像。与VTON需应对多样的姿态和风格变化不同，VTOFF得益于一致且定义明确的输出格式——通常是服装的平铺展示形式，这使其成为数据生成与数据集增强的有力工具。然而，现有VTOFF方法面临两大局限：(i)难以从遮挡和复杂姿态中分离服装特征，常导致视觉伪影；(ii)仅适用于单一类别服装（如上衣），限制了其泛化能力。为应对这些挑战，我们提出了文本增强多类别虚拟脱衣（TEMU-VTOFF），一种创新架构，采用双DiT基干网络并改进多模态注意力机制，以实现稳健的服装特征提取。该架构设计为接收来自图像、文本及掩码等多模态的服装信息，以适应多类别场景。此外，我们引入了一个额外的对齐模块，以进一步优化生成的视觉细节。在VITON-HD和Dress Code数据集上的实验表明，TEMU-VTOFF在VTOFF任务上树立了新的技术标杆，显著提升了视觉质量及对目标服装的忠实度。

English

While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals. Unlike VTON, which must resolve diverse pose and style variations, VTOFF benefits from a consistent and well-defined output format -- typically a flat, lay-down-style representation of the garment -- making it a promising tool for data generation and dataset enhancement. However, existing VTOFF approaches face two major limitations: (i) difficulty in disentangling garment features from occlusions and complex poses, often leading to visual artifacts, and (ii) restricted applicability to single-category garments (e.g., upper-body clothes only), limiting generalization. To address these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone with a modified multimodal attention mechanism for robust garment feature extraction. Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting. Finally, we propose an additional alignment module to further refine the generated visual details. Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task, significantly improving both visual quality and fidelity to the target garments.