구동 가능한 3D 가우시안 아바타

초록

우리는 가우시안 스플랫(Gaussian splat)으로 렌더링된 인간 신체를 위한 최초의 3D 제어 가능 모델인 Drivable 3D Gaussian Avatars(D3GA)를 소개합니다. 현재의 사실적인 드라이버블 아바타는 학습 중 정확한 3D 등록, 테스트 중 밀집된 입력 이미지, 또는 둘 다를 필요로 합니다. 신경 방사 필드(Neural Radiance Fields) 기반의 아바타는 텔레프레즌스(telepresence) 애플리케이션에 사용하기에는 지나치게 느린 경향이 있습니다. 본 연구는 최근 제시된 3D 가우시안 스플랫팅(3DGS) 기술을 사용하여 밀집된 보정된 다중 뷰 비디오를 입력으로 실시간 프레임 속도로 사실적인 인간을 렌더링합니다. 이러한 기본 요소를 변형하기 위해 일반적으로 사용되는 선형 블렌드 스키닝(Linear Blend Skinning, LBS) 포인트 변형 방법을 벗어나 고전적인 볼륨 변형 방법인 케이지 변형(cage deformations)을 사용합니다. 더 작은 크기를 고려하여, 우리는 통신 애플리케이션에 더 적합한 관절 각도와 키포인트로 이러한 변형을 구동합니다. 다양한 체형, 의상 및 동작을 가진 9명의 실험 대상에 대한 실험에서 동일한 학습 및 테스트 데이터를 사용할 때 최신 방법보다 더 높은 품질의 결과를 얻었습니다.

English

We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.