HeadStudio:使用3D高斯飞溅将文本转换为可动画化的头部化身
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
February 9, 2024
作者: Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang
cs.AI
摘要
从文本提示中创建数字化头像长期以来一直是一项令人向往但具有挑战性的任务。尽管最近的研究工作中通过2D扩散先验取得了令人期待的成果,但当前方法在有效实现高质量和动画头像方面面临挑战。在本文中,我们提出了HeadStudio,一个新颖的框架,利用3D高斯飞溅来从文本提示生成逼真且具有动画效果的头像。我们的方法通过中间的FLAME表示驱动3D高斯,从而在语义上创建灵活且可实现的外观。具体而言,我们将FLAME融入到3D表示和分数蒸馏中:1)基于FLAME的3D高斯飞溅,通过将每个点绑定到FLAME网格来驱动3D高斯点。2)基于FLAME的分数蒸馏采样,利用基于FLAME的细粒度控制信号来指导从文本提示中进行分数蒸馏。大量实验证明了HeadStudio在从文本提示生成可动画头像方面的有效性,展现出视觉上吸引人的外观。这些头像能够以1024的分辨率以高质量实时(大于等于40 fps)呈现新颖视角。它们可以通过真实世界的语音和视频进行流畅控制。我们希望HeadStudio能推进数字化头像的创作,当前的方法能够广泛应用于各个领域。
English
Creating digital avatars from textual prompts has long been a desirable yet
challenging task. Despite the promising outcomes obtained through 2D diffusion
priors in recent works, current methods face challenges in achieving
high-quality and animated avatars effectively. In this paper, we present
HeadStudio, a novel framework that utilizes 3D Gaussian splatting to
generate realistic and animated avatars from text prompts. Our method drives 3D
Gaussians semantically to create a flexible and achievable appearance through
the intermediate FLAME representation. Specifically, we incorporate the FLAME
into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian
splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh. 2)
FLAME-based score distillation sampling, utilizing FLAME-based fine-grained
control signal to guide score distillation from the text prompt. Extensive
experiments demonstrate the efficacy of HeadStudio in generating animatable
avatars from textual prompts, exhibiting visually appealing appearances. The
avatars are capable of rendering high-quality real-time (geq 40 fps) novel
views at a resolution of 1024. They can be smoothly controlled by real-world
speech and video. We hope that HeadStudio can advance digital avatar creation
and that the present method can widely be applied across various domains.