Garments2Look: 의류 및 액세서리를 포함한 아웃핏 수준의 고품질 가상 피팅을 위한 다중 참조 데이터셋

초록

가상 피팅(VTON)은 단일 의류 시각화 기술이 발전했으나, 실제 패션은 다양한 의류, 액세서리, 세분화된 카테고리, 레이어링, 다양한 스타일링을 포함한 전체 의상 구성에 중점을 두고 있어 현재 VTON 시스템의 범위를 벗어납니다. 기존 데이터셋은 카테고리가 제한적이고 의상 구성 다양성이 부족합니다. 본 연구에서는 아웃핏 수준 VTON을 위한 최초의 대규모 멀티모달 데이터셋인 Garments2Look를 소개합니다. 이 데이터셋은 40개 주요 카테고리와 300개 이상의 세분화된 하위 카테고리에서 80,000개의 다중 의류-단일 룩 쌍으로 구성됩니다. 각 쌍은 3-12개의 참조 의류 이미지(평균 4.48개)로 이루어진 아웃핏, 해당 아웃핏을 입은 모델 이미지, 상세 항목 및 가상 피팅 텍스트 주석을 포함합니다. 실제성과 다양성의 균형을 위해 합성 파이프라인을 제안합니다. 이는 휴리스틱 방식으로 아웃핏 목록을 구성한 후 피팅 결과를 생성하며, 전체 과정은 데이터 품질 보장을 위한 엄격한 자동 필터링과 인간 검증을 거칩니다. 과제 난이도를 탐구하기 위해 SOTA VTON 방법과 범용 이미지 편집 모델을 적용하여 기준선을 설정했습니다. 결과에 따르면 현재 방법들은 완전한 아웃핏의 자연스러운 피팅과 올바른 레이어링 및 스타일링 추론에 어려움을 겪어 정렬 오류와 인공적 결함을 발생시키는 것으로 나타났습니다.

English

Virtual try-on (VTON) has advanced single-garment visualization, yet real-world fashion centers on full outfits with multiple garments, accessories, fine-grained categories, layering, and diverse styling, remaining beyond current VTON systems. Existing datasets are category-limited and lack outfit diversity. We introduce Garments2Look, the first large-scale multimodal dataset for outfit-level VTON, comprising 80K many-garments-to-one-look pairs across 40 major categories and 300+ fine-grained subcategories. Each pair includes an outfit with 3-12 reference garment images (Average 4.48), a model image wearing the outfit, and detailed item and try-on textual annotations. To balance authenticity and diversity, we propose a synthesis pipeline. It involves heuristically constructing outfit lists before generating try-on results, with the entire process subjected to strict automated filtering and human validation to ensure data quality. To probe task difficulty, we adapt SOTA VTON methods and general-purpose image editing models to establish baselines. Results show current methods struggle to try on complete outfits seamlessly and to infer correct layering and styling, leading to misalignment and artifacts.

Garments2Look: 의류 및 액세서리를 포함한 아웃핏 수준의 고품질 가상 피팅을 위한 다중 참조 데이터셋

Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories

초록

Support