하이브리드 3D 표현을 활용한 분리된 아바타 학습

초록

애니메이션 가능하고 사실적인 인간 아바타를 학습하기 위해 엄청난 노력이 기울여져 왔습니다. 이를 위해 전체 인간(예: 신체, 의상, 얼굴 및 머리카락)의 포괄적인 모델링과 캡처를 위해 명시적 및 암묵적 3D 표현이 광범위하게 연구되었지만, 인간 아바타의 각 부분은 서로 다른 모델링 요구 사항을 가지고 있기 때문에 어느 표현도 표현 효율성 측면에서 최적의 선택이 아닙니다. 예를 들어, 메쉬는 일반적으로 의상과 머리카락을 모델링하는 데 적합하지 않습니다. 이러한 동기로부터, 우리는 하이브리드 명시적-암묵적 3D 표현으로 인간을 모델링하는 Disentangled Avatars~(DELTA)를 제안합니다. DELTA는 단안 RGB 비디오를 입력으로 받아 신체와 의상/머리카락 레이어가 분리된 인간 아바타를 생성합니다. 구체적으로, 우리는 DELTA의 두 가지 중요한 응용 사례를 보여줍니다. 첫 번째로, 인간 신체와 의상의 분리를 고려하고, 두 번째로, 얼굴과 머리카락의 분리를 고려합니다. 이를 위해, DELTA는 신체 또는 얼굴을 명시적 메쉬 기반 파라미터 3D 모델로 표현하고, 의상 또는 머리카락을 암묵적 신경 방사 필드로 표현합니다. 이를 가능하게 하기 위해, 우리는 메쉬를 볼륨 렌더링에 통합하는 엔드투엔드 미분 가능 렌더러를 설계하여 DELTA가 3D 감독 없이 단안 비디오로부터 직접 학습할 수 있도록 합니다. 마지막으로, 우리는 이 두 응용 사례가 어떻게 쉽게 결합되어 머리카락, 얼굴, 신체 및 의상이 완전히 분리되면서도 함께 렌더링될 수 있는 전신 아바타를 모델링할 수 있는지 보여줍니다. 이러한 분리는 임의의 신체 형태에 머리카락과 의상을 전송할 수 있게 합니다. 우리는 DELTA의 분리 효과를 분리된 재구성, 가상 의상 입어보기 및 헤어스타일 전송에서의 유망한 성능을 통해 실증적으로 검증합니다. 향후 연구를 촉진하기 위해, 우리는 하이브리드 인간 아바타 모델링 연구를 위한 오픈소스 파이프라인도 공개합니다.

English

Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.

하이브리드 3D 표현을 활용한 분리된 아바타 학습

Learning Disentangled Avatars with Hybrid 3D Representations

초록

Support