Garments2Look：一个支持服装配饰高保真全身试穿的多参考数据集

摘要

虚拟试衣技术虽已实现单件服装的可视化，但现实时尚场景以包含多件服装、配饰、细粒度品类、叠穿搭配及多样风格的整体造型为核心，这仍是当前虚拟试衣系统的盲区。现有数据集存在品类局限且缺乏造型多样性。我们推出首个面向整体造型虚拟试衣的大规模多模态数据集Garments2Look，包含8万组多服装至单造型配对，涵盖40个主要品类与300余个细分子类别。每组数据包含含3-12张参考服装图像（平均4.48张）的完整造型、模特上身效果图，以及详细的单品描述与试穿文本标注。为平衡真实性与多样性，我们提出合成流水线：先启发式构建造型清单再生成试穿效果，全程经过严格自动过滤与人工校验以确保数据质量。通过适配现有顶尖虚拟试衣方法与通用图像编辑模型建立基准测试，发现当前方法难以无缝试穿完整造型，也无法准确推断叠穿逻辑与搭配风格，导致错位与伪影问题。

English

Virtual try-on (VTON) has advanced single-garment visualization, yet real-world fashion centers on full outfits with multiple garments, accessories, fine-grained categories, layering, and diverse styling, remaining beyond current VTON systems. Existing datasets are category-limited and lack outfit diversity. We introduce Garments2Look, the first large-scale multimodal dataset for outfit-level VTON, comprising 80K many-garments-to-one-look pairs across 40 major categories and 300+ fine-grained subcategories. Each pair includes an outfit with 3-12 reference garment images (Average 4.48), a model image wearing the outfit, and detailed item and try-on textual annotations. To balance authenticity and diversity, we propose a synthesis pipeline. It involves heuristically constructing outfit lists before generating try-on results, with the entire process subjected to strict automated filtering and human validation to ensure data quality. To probe task difficulty, we adapt SOTA VTON methods and general-purpose image editing models to establish baselines. Results show current methods struggle to try on complete outfits seamlessly and to infer correct layering and styling, leading to misalignment and artifacts.

Garments2Look：一个支持服装配饰高保真全身试穿的多参考数据集

Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories

摘要

Support