FIT: 핏 인식 가상 피팅을 위한 대규모 데이터셋

초록

사용자와 의류 이미지가 주어졌을 때, 가상 피팅(VTO)은 해당 사용자의 원래 자세와 정체성을 보존하면서 의류를 입은 현실적인 이미지를 합성하는 것을 목표로 합니다. 최근 VTO 방법들은 의류 외관 시각화에서는 뛰어난 성능을 보이지만, 피팅 경험의 중요한 측면인 의류 핏 정확성(예: 엑스트라 스몰 체형에 엑스트라 라지 셔츠가 어떻게 보이는지 묘사)은 대부분 간과하고 있습니다. 핵심 장애물은 정확한 의류 및 체형 사이즈 정보, 특히 의류가 현저히 크거나 작은 '부적합' 사례에 대한 데이터셋이 부족하다는 점입니다. 이로 인해 현재 VTO 방법들은 의류나 사용자 크기와 관계없이 잘 맞는 결과를 생성하는 것이 기본 동작이 되었습니다. 본 논문에서는 이 미해결 문제를 해결하기 위한 첫 걸음을 내디뎠습니다. 우리는 정확한 체형 및 의류 측정치와 함께 1.13M개 이상의 피팅 이미지 삼중항으로 구성된 대규모 VTO 데이터셋인 FIT(Fit-Inclusive Try-on)을 소개합니다. 우리는 확장 가능한 합성 전략을 통해 데이터 수집의 어려움을 극복했습니다: (1) GarmentCode를 사용하여 3D 의류를 프로그램 방식으로 생성하고 물리 시뮬레이션을 통해 드레이핑하여 현실적인 의류 핏을 포착합니다. (2) 형상을 엄격히 보존하면서 합성 렌더링을 사실적인 이미지로 변환하는 새로운 리텍스처링 프레임워크를 활용합니다. (3) 지도 학습을 위한 짝을 이룬 사용자 이미지(동일 인물, 다른 의류)를 생성하기 위해 리텍스처링 모델에 인물 정체성 보존 기법을 도입했습니다. 마지막으로, 우리는 FIT 데이터셋을 활용하여 핏 인식 가상 피팅 베이스라인 모델을 학습시켰습니다. 우리의 데이터와 결과는 핏 인식 가상 피팅 분야의 새로운 최첨단 기술을 제시하며, 향후 연구를 위한 견고한 벤치마크를 제공합니다. 모든 데이터와 코드는 프로젝트 페이지(https://johannakarras.github.io/FIT)에서 공개될 예정입니다.

English

Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment, while preserving their original pose and identity. Although recent VTO methods excel at visualizing garment appearance, they largely overlook a crucial aspect of the try-on experience: the accuracy of garment fit -- for example, depicting how an extra-large shirt looks on an extra-small person. A key obstacle is the absence of datasets that provide precise garment and body size information, particularly for "ill-fit" cases, where garments are significantly too large or too small. Consequently, current VTO methods default to generating well-fitted results regardless of the garment or person size. In this paper, we take the first steps towards solving this open problem. We introduce FIT (Fit-Inclusive Try-on), a large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. We overcome the challenges of data collection via a scalable synthetic strategy: (1) We programmatically generate 3D garments using GarmentCode and drape them via physics simulation to capture realistic garment fit. (2) We employ a novel re-texturing framework to transform synthetic renderings into photorealistic images while strictly preserving geometry. (3) We introduce person identity preservation into our re-texturing model to generate paired person images (same person, different garments) for supervised training. Finally, we leverage our FIT dataset to train a baseline fit-aware virtual try-on model. Our data and results set the new state-of-the-art for fit-aware virtual try-on, as well as offer a robust benchmark for future research. We will make all data and code publicly available on our project page: https://johannakarras.github.io/FIT.

FIT: 핏 인식 가상 피팅을 위한 대규모 데이터셋

FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

초록

Support