GenCA:一种文本条件生成模型,用于生成逼真可驾驶的编解码头像。
GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars
August 24, 2024
作者: Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson, Hongsheng Li, Sofien Bouaziz
cs.AI
摘要
逼真且可控的3D头像对于各种应用至关重要,如虚拟与混合现实(VR/MR)、遥感、游戏和电影制作。传统的头像创建方法通常涉及耗时的扫描和重建过程,限制了其可扩展性。此外,这些方法无法提供采样新身份或修改现有身份的灵活性。另一方面,通过从数据中学习强大的先验知识,生成模型为传统的重建方法提供了一种有希望的替代方案,减轻了数据捕获和处理的时间限制。此外,生成方法使得在重建之外的下游应用成为可能,如编辑和风格化。然而,关于生成3D头像的研究仍处于起步阶段,因此当前方法仍存在诸如创建静态头像、缺乏逼真性、面部细节不完整或驾驶能力有限等局限。为了解决这一问题,我们提出了一种文本条件的生成模型,可以生成多样化身份的逼真面部头像,具有更完整的头发、眼睛和口腔内部等细节,并且可以通过强大的非参数潜在表达空间进行驾驶。具体来说,我们将潜在扩散模型的生成和编辑能力与头像表情驾驶的强大先验模型相结合。
我们的模型可以生成和控制高保真度的头像,甚至可以处理分布之外的头像。我们还强调了其在下游应用中的潜力,包括头像编辑和单次头像重建。
English
Photo-realistic and controllable 3D avatars are crucial for various
applications such as virtual and mixed reality (VR/MR), telepresence, gaming,
and film production. Traditional methods for avatar creation often involve
time-consuming scanning and reconstruction processes for each avatar, which
limits their scalability. Furthermore, these methods do not offer the
flexibility to sample new identities or modify existing ones. On the other
hand, by learning a strong prior from data, generative models provide a
promising alternative to traditional reconstruction methods, easing the time
constraints for both data capture and processing. Additionally, generative
methods enable downstream applications beyond reconstruction, such as editing
and stylization. Nonetheless, the research on generative 3D avatars is still in
its infancy, and therefore current methods still have limitations such as
creating static avatars, lacking photo-realism, having incomplete facial
details, or having limited drivability. To address this, we propose a
text-conditioned generative model that can generate photo-realistic facial
avatars of diverse identities, with more complete details like hair, eyes and
mouth interior, and which can be driven through a powerful non-parametric
latent expression space. Specifically, we integrate the generative and editing
capabilities of latent diffusion models with a strong prior model for avatar
expression driving.
Our model can generate and control high-fidelity avatars, even those
out-of-distribution. We also highlight its potential for downstream
applications, including avatar editing and single-shot avatar reconstruction.Summary
AI-Generated Summary