Garments2Look:一个支持服装配饰高保真套装级虚拟试穿的多参考数据集
Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories
March 14, 2026
作者: Junyao Hu, Zhongwei Cheng, Waikeung Wong, Xingxing Zou
cs.AI
摘要
虚拟试衣技术虽已实现单件服装的可视化,但现实时尚搭配注重包含多件服装、配饰、细分类别、层次叠穿及多元风格的整体造型,这仍是当前VTON系统的技术盲区。现有数据集存在品类局限且缺乏造型多样性。我们推出首个面向整体造型级VTON的大规模多模态数据集Garments2Look,包含8万组多服装至单造型的配对数据,涵盖40个主要品类与300余个细分子类。每组数据包含由3-12张参考服装图像(平均4.48张)构成的整套造型、模特上身效果图,以及详细的单品描述与试穿文本标注。为平衡真实性与多样性,我们提出合成流水线:先通过启发式方法构建造型清单,再生成试穿效果,整个流程经过严格自动化筛选与人工验证以确保数据质量。为评估任务难度,我们适配了多种SOTA虚拟试衣方法与通用图像编辑模型建立基准测试。结果表明,现有方法难以实现整套服装的无缝试穿,也无法准确推断层次搭配与造型风格,导致错位与伪影问题。
English
Virtual try-on (VTON) has advanced single-garment visualization, yet real-world fashion centers on full outfits with multiple garments, accessories, fine-grained categories, layering, and diverse styling, remaining beyond current VTON systems. Existing datasets are category-limited and lack outfit diversity. We introduce Garments2Look, the first large-scale multimodal dataset for outfit-level VTON, comprising 80K many-garments-to-one-look pairs across 40 major categories and 300+ fine-grained subcategories. Each pair includes an outfit with 3-12 reference garment images (Average 4.48), a model image wearing the outfit, and detailed item and try-on textual annotations. To balance authenticity and diversity, we propose a synthesis pipeline. It involves heuristically constructing outfit lists before generating try-on results, with the entire process subjected to strict automated filtering and human validation to ensure data quality. To probe task difficulty, we adapt SOTA VTON methods and general-purpose image editing models to establish baselines. Results show current methods struggle to try on complete outfits seamlessly and to infer correct layering and styling, leading to misalignment and artifacts.