UP2You: 제약 없는 사진 컬렉션에서 빠르게 자신을 재구성하기

초록

우리는 극도로 제약 없는 실제 환경의 2D 사진에서 고품질 3D 의상 인물 복원을 위한 최초의 튜닝 프리 솔루션인 UP2You를 소개합니다. 기존 접근법들이 "깔끔한" 입력(예: 최소한의 가림이 있는 전신 이미지 또는 잘 보정된 교차 뷰 캡처)을 요구하는 것과 달리, UP2You는 포즈, 시점, 크롭핑, 가림 등이 크게 달라질 수 있는 원시적이고 구조화되지 않은 사진을 직접 처리합니다. 데이터를 토큰으로 압축하여 느린 온라인 텍스트-3D 최적화를 수행하는 대신, 우리는 제약 없는 입력을 깔끔한 직교 다중 뷰 이미지로 효율적으로 변환하는 데이터 정류기 패러다임을 도입하여, 단일 순방향 전달로 몇 초 만에 3D 복원을 단순화합니다. UP2You의 핵심은 포즈 상관 특징 집계 모듈(PCFA)로, 이는 다수의 참조 이미지로부터 타겟 포즈에 대한 정보를 선택적으로 융합하여 더 나은 신원 보존과 거의 일정한 메모리 사용량을 가능하게 하며, 더 많은 관찰을 제공합니다. 또한, 우리는 사전 캡처된 신체 템플릿이 필요 없는 퍼시버 기반 다중 참조 형상 예측기를 도입했습니다. 4D-Dress, PuzzleIOI 및 실제 환경 캡처에 대한 광범위한 실험을 통해 UP2You가 기하학적 정확도(PuzzleIOI에서 Chamfer-15%, P2S-18%)와 텍스처 충실도(4D-Dress에서 PSNR-21%, LPIPS-46%) 모두에서 이전 방법들을 일관되게 능가함을 입증했습니다. UP2You는 효율적(1인당 1.5분)이며 다용도(임의의 포즈 제어 및 훈련 없이 다중 의상 3D 가상 피팅 지원)로, 인간이 캐주얼하게 캡처된 실제 시나리오에 실용적입니다. 모델과 코드는 이 미개척 분야의 향후 연구를 촉진하기 위해 공개될 예정입니다. 프로젝트 페이지: https://zcai0612.github.io/UP2You

English

We present UP2You, the first tuning-free solution for reconstructing high-fidelity 3D clothed portraits from extremely unconstrained in-the-wild 2D photos. Unlike previous approaches that require "clean" inputs (e.g., full-body images with minimal occlusions, or well-calibrated cross-view captures), UP2You directly processes raw, unstructured photographs, which may vary significantly in pose, viewpoint, cropping, and occlusion. Instead of compressing data into tokens for slow online text-to-3D optimization, we introduce a data rectifier paradigm that efficiently converts unconstrained inputs into clean, orthogonal multi-view images in a single forward pass within seconds, simplifying the 3D reconstruction. Central to UP2You is a pose-correlated feature aggregation module (PCFA), that selectively fuses information from multiple reference images w.r.t. target poses, enabling better identity preservation and nearly constant memory footprint, with more observations. We also introduce a perceiver-based multi-reference shape predictor, removing the need for pre-captured body templates. Extensive experiments on 4D-Dress, PuzzleIOI, and in-the-wild captures demonstrate that UP2You consistently surpasses previous methods in both geometric accuracy (Chamfer-15%, P2S-18% on PuzzleIOI) and texture fidelity (PSNR-21%, LPIPS-46% on 4D-Dress). UP2You is efficient (1.5 minutes per person), and versatile (supports arbitrary pose control, and training-free multi-garment 3D virtual try-on), making it practical for real-world scenarios where humans are casually captured. Both models and code will be released to facilitate future research on this underexplored task. Project Page: https://zcai0612.github.io/UP2You

UP2You: 제약 없는 사진 컬렉션에서 빠르게 자신을 재구성하기

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

초록

Support