FRESA: 소수 이미지로부터 개인화된 스킨드 아바타의 피드포워드 재구성

초록

몇 장의 이미지만으로 사실적인 애니메이션을 갖춘 개인화된 3D 인간 아바타를 재구성하는 새로운 방법을 제안합니다. 신체 형태, 자세, 의복 유형의 다양성으로 인해 기존 방법들은 주로 추론 과정에서 대상별로 수 시간에 걸친 최적화를 필요로 하며, 이는 실제 응용에 제약을 줍니다. 이와 대조적으로, 우리는 천 명 이상의 의복을 입은 인간 데이터로부터 보편적인 사전 지식을 학습하여 즉각적인 피드포워드 생성과 제로샷 일반화를 달성합니다. 구체적으로, 아바타에 공유된 스키닝 가중치를 적용하는 대신, 개인화된 아바타 형태, 스키닝 가중치, 그리고 자세에 따른 변형을 함께 추론함으로써 전반적인 기하학적 충실도를 효과적으로 개선하고 변형 아티팩트를 줄입니다. 또한, 자세 변화를 정규화하고 정규 형태와 스키닝 가중치 간의 복합적 모호성을 해결하기 위해, 픽셀 정렬 초기 조건을 생성하는 3D 정규화 프로세스를 설계하여 세밀한 기하학적 디테일의 재구성을 돕습니다. 그런 다음, 정규화 과정에서 발생하는 아티팩트를 견고하게 줄이고 개인별 정체성을 보존하는 그럴듯한 아바타를 융합하기 위해 다중 프레임 특징 집계를 제안합니다. 마지막으로, 다양한 인간 대상과 고품질 3D 스캔이 쌍을 이루는 대규모 캡처 데이터셋에서 모델을 종단 간 프레임워크로 학습시킵니다. 광범위한 실험을 통해 우리의 방법이 최신 기술보다 더 진실된 재구성과 애니메이션을 생성하며, 일상적으로 촬영된 휴대폰 사진 입력에도 직접 일반화될 수 있음을 보여줍니다. 프로젝트 페이지와 코드는 https://github.com/rongakowang/FRESA에서 확인할 수 있습니다.

English

We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.

FRESA: 소수 이미지로부터 개인화된 스킨드 아바타의 피드포워드 재구성

FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

초록

Support