AvatarReX: Avatar Full-body Espressivi in Tempo Reale

Abstract

Presentiamo AvatarReX, un nuovo metodo per apprendere avatar full-body basati su NeRF a partire da dati video. L'avatar appreso non solo offre un controllo espressivo congiunto di corpo, mani e volto, ma supporta anche animazione e rendering in tempo reale. A tal fine, proponiamo una rappresentazione compositiva dell'avatar, in cui corpo, mani e volto sono modellati separatamente in modo da sfruttare correttamente i prior strutturali derivati da modelli mesh parametrici senza compromettere la flessibilità della rappresentazione. Inoltre, separiamo la geometria e l'aspetto per ciascuna parte. Con queste scelte tecniche, proponiamo una pipeline di rendering differito dedicata, che può essere eseguita a frame rate real-time per sintetizzare immagini free-view di alta qualità. La separazione tra geometria e aspetto ci permette inoltre di progettare una strategia di addestramento a due passi che combina rendering volumetrico e rendering superficiale per il training della rete. In questo modo, è possibile applicare una supervisione a livello di patch per forzare la rete ad apprendere dettagli nitidi dell'aspetto sulla base della stima geometrica. Nel complesso, il nostro metodo consente la costruzione automatica di avatar full-body espressivi con capacità di rendering in tempo reale, e può generare immagini foto-realistiche con dettagli dinamici per nuovi movimenti del corpo ed espressioni facciali.

English

We present AvatarReX, a new method for learning NeRF-based full-body avatars from video data. The learnt avatar not only provides expressive control of the body, hands and the face together, but also supports real-time animation and rendering. To this end, we propose a compositional avatar representation, where the body, hands and the face are separately modeled in a way that the structural prior from parametric mesh templates is properly utilized without compromising representation flexibility. Furthermore, we disentangle the geometry and appearance for each part. With these technical designs, we propose a dedicated deferred rendering pipeline, which can be executed in real-time framerate to synthesize high-quality free-view images. The disentanglement of geometry and appearance also allows us to design a two-pass training strategy that combines volume rendering and surface rendering for network training. In this way, patch-level supervision can be applied to force the network to learn sharp appearance details on the basis of geometry estimation. Overall, our method enables automatic construction of expressive full-body avatars with real-time rendering capability, and can generate photo-realistic images with dynamic details for novel body motions and facial expressions.

AvatarReX: Avatar Full-body Espressivi in Tempo Reale

AvatarReX: Real-time Expressive Full-body Avatars

Abstract

Support