ChatPaper.aiChatPaper

Garments2Look:一个支持服装配饰高保真套装级虚拟试穿的多参考数据集

Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories

March 14, 2026
作者: Junyao Hu, Zhongwei Cheng, Waikeung Wong, Xingxing Zou
cs.AI

摘要

虚拟试衣技术虽已实现单件服装的可视化,但现实时尚搭配注重包含多件服装、配饰、细分类别、层次叠穿及多元风格的整体造型,这仍是当前VTON系统的技术盲区。现有数据集存在品类局限且缺乏造型多样性。我们推出首个面向整体造型级VTON的大规模多模态数据集Garments2Look,包含8万组多服装至单造型的配对数据,涵盖40个主要品类与300余个细分子类。每组数据包含由3-12张参考服装图像(平均4.48张)构成的整套造型、模特上身效果图,以及详细的单品描述与试穿文本标注。为平衡真实性与多样性,我们提出合成流水线:先通过启发式方法构建造型清单,再生成试穿效果,整个流程经过严格自动化筛选与人工验证以确保数据质量。为评估任务难度,我们适配了多种SOTA虚拟试衣方法与通用图像编辑模型建立基准测试。结果表明,现有方法难以实现整套服装的无缝试穿,也无法准确推断层次搭配与造型风格,导致错位与伪影问题。
English
Virtual try-on (VTON) has advanced single-garment visualization, yet real-world fashion centers on full outfits with multiple garments, accessories, fine-grained categories, layering, and diverse styling, remaining beyond current VTON systems. Existing datasets are category-limited and lack outfit diversity. We introduce Garments2Look, the first large-scale multimodal dataset for outfit-level VTON, comprising 80K many-garments-to-one-look pairs across 40 major categories and 300+ fine-grained subcategories. Each pair includes an outfit with 3-12 reference garment images (Average 4.48), a model image wearing the outfit, and detailed item and try-on textual annotations. To balance authenticity and diversity, we propose a synthesis pipeline. It involves heuristically constructing outfit lists before generating try-on results, with the entire process subjected to strict automated filtering and human validation to ensure data quality. To probe task difficulty, we adapt SOTA VTON methods and general-purpose image editing models to establish baselines. Results show current methods struggle to try on complete outfits seamlessly and to infer correct layering and styling, leading to misalignment and artifacts.
PDF23March 18, 2026