GAvatar: 암시적 메쉬 학습을 통한 애니메이션 가능한 3D 가우시안 아바타

초록

가우시안 스플래팅(Gaussian splatting)은 명시적(메시) 및 암시적(NeRF) 3D 표현의 장점을 모두 활용하는 강력한 3D 표현 방식으로 부상하고 있다. 본 논문에서는 텍스트 설명으로부터 사실적인 애니메이션 가능한 아바타를 생성하기 위해 가우시안 스플래팅을 활용하고자 하며, 메시 또는 NeRF 기반 표현 방식의 한계(예: 유연성 및 효율성)를 해결하고자 한다. 그러나 가우시안 스플래팅을 단순히 적용하는 것만으로는 고품질의 애니메이션 가능한 아바타를 생성할 수 없으며 학습 불안정성을 겪게 되고, 미세한 아바타 형상을 포착하지 못하거나 퇴화된 신체 부위를 초래하는 경우가 많다. 이러한 문제를 해결하기 위해, 우리는 먼저 포즈 기반 프리미티브 내부에 가우시안을 정의하여 애니메이션을 용이하게 하는 프리미티브 기반 3D 가우시안 표현 방식을 제안한다. 둘째, 수백만 개의 가우시안 학습을 안정화하고 분산시키기 위해 신경망 암시적 필드를 사용하여 가우시안 속성(예: 색상)을 예측하는 방법을 제안한다. 마지막으로, 미세한 아바타 형상을 포착하고 상세한 메시를 추출하기 위해, 3D 가우시안에 대한 새로운 SDF 기반 암시적 메시 학습 접근법을 제안한다. 이 방법은 기본 형상을 규제하고 매우 상세한 텍스처 메시를 추출한다. 우리가 제안한 방법인 GAvatar는 텍스트 프롬프트만을 사용하여 다양한 애니메이션 가능한 아바타를 대규모로 생성할 수 있게 한다. GAvatar는 외관 및 형상 품질 측면에서 기존 방법을 크게 능가하며, 1K 해상도에서 초고속 렌더링(100 fps)을 달성한다.

English

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.

GAvatar: 암시적 메쉬 학습을 통한 애니메이션 가능한 3D 가우시안 아바타

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

초록

Support