Apprendimento di Avatar Disgiunti con Rappresentazioni Ibride 3D

Abstract

Sono stati compiuti sforzi considerevoli per apprendere avatar umani animabili e fotorealistici. A tal fine, sia le rappresentazioni 3D esplicite che quelle implicite sono state ampiamente studiate per una modellazione e cattura olistica dell'intero essere umano (ad esempio, corpo, abbigliamento, viso e capelli), ma nessuna delle due rappresentazioni costituisce una scelta ottimale in termini di efficacia rappresentativa, poiché diverse parti dell'avatar umano hanno esigenze di modellazione differenti. Ad esempio, le mesh generalmente non sono adatte per modellare abbigliamento e capelli. Motivati da ciò, presentiamo Disentangled Avatars (DELTA), che modella gli esseri umani con rappresentazioni 3D ibride esplicite-implicite. DELTA prende in input un video RGB monoculare e produce un avatar umano con strati separati per il corpo e per l'abbigliamento/capelli. Nello specifico, dimostriamo due importanti applicazioni per DELTA. Nella prima, consideriamo la separazione tra il corpo umano e l'abbigliamento, mentre nella seconda separiamo il viso dai capelli. Per fare ciò, DELTA rappresenta il corpo o il viso con un modello parametrico 3D esplicito basato su mesh, e l'abbigliamento o i capelli con un campo di radianza neurale implicito. Per rendere ciò possibile, progettiamo un renderer differenziabile end-to-end che integra le mesh nel rendering volumetrico, consentendo a DELTA di apprendere direttamente da video monoculari senza alcuna supervisione 3D. Infine, mostriamo come queste due applicazioni possano essere facilmente combinate per modellare avatar a corpo intero, in modo che capelli, viso, corpo e abbigliamento possano essere completamente separati ma renderizzati congiuntamente. Tale separazione consente il trasferimento di capelli e abbigliamento a forme corporee arbitrarie. Validiamo empiricamente l'efficacia della separazione di DELTA dimostrando le sue prestazioni promettenti nella ricostruzione separata, nel virtual try-on degli abiti e nel trasferimento di acconciature. Per facilitare la ricerca futura, rilasciamo anche una pipeline open-source per lo studio della modellazione ibrida di avatar umani.

English

Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.

Apprendimento di Avatar Disgiunti con Rappresentazioni Ibride 3D

Learning Disentangled Avatars with Hybrid 3D Representations

Abstract

Support