Avatar 3D Gaussiani Guidabili

Abstract

Presentiamo Drivable 3D Gaussian Avatars (D3GA), il primo modello 3D controllabile per corpi umani renderizzato con splatting gaussiano. Gli attuali avatar fotorealistici controllabili richiedono durante l'addestramento registrazioni 3D accurate, immagini di input dense durante il testing, o entrambe. Quelli basati su campi di radianza neurale tendono inoltre a essere eccessivamente lenti per applicazioni di telepresenza. Questo lavoro utilizza la tecnica di 3D Gaussian Splatting (3DGS), recentemente proposta, per renderizzare esseri umani realistici a frame rate in tempo reale, utilizzando come input video multi-vista calibrati e densi. Per deformare queste primitive, ci allontaniamo dal metodo comunemente usato di deformazione a punti tramite linear blend skinning (LBS) e adottiamo un classico metodo di deformazione volumetrica: le deformazioni a gabbia. Data la loro dimensione ridotta, guidiamo queste deformazioni con angoli articolari e keypoint, che sono più adatti per applicazioni di comunicazione. I nostri esperimenti su nove soggetti con forme corporee, abiti e movimenti variati ottengono risultati di qualità superiore rispetto ai metodi state-of-the-art quando si utilizzano gli stessi dati di addestramento e test.

English

We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.