ChatPaper.aiChatPaper

GenCA:一種以文本為條件的生成模型,用於逼真且可駕駛的編解碼頭像。

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

August 24, 2024
作者: Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson, Hongsheng Li, Sofien Bouaziz
cs.AI

摘要

逼真且可控制的3D頭像對於各種應用至關重要,如虛擬與混合現實(VR/MR)、遠程存在、遊戲和電影製作。傳統的頭像創建方法通常涉及耗時的掃描和重建過程,對於每個頭像都有所限制,這限制了它們的可擴展性。此外,這些方法並不提供抽樣新身份或修改現有身份的靈活性。另一方面,通過從數據中學習強大的先驗知識,生成模型為傳統重建方法提供了一個有前途的替代方案,緩解了數據捕獲和處理的時間限制。此外,生成方法使得在重建之外的下游應用成為可能,如編輯和風格化。然而,關於生成式3D頭像的研究仍處於起步階段,因此當前的方法仍存在諸如創建靜態頭像、缺乏逼真性、面部細節不完整或可驅動性有限等限制。為了解決這個問題,我們提出了一種文本條件的生成模型,可以生成具有多樣身份的逼真面部頭像,具有更完整的細節,如頭髮、眼睛和口腔內部,並且可以通過強大的非參數潛在表達空間進行驅動。具體來說,我們將潛在擴散模型的生成和編輯能力與頭像表達驅動的強先驗模型相結合。我們的模型可以生成和控制高保真度的頭像,甚至可以處理那些超出分佈範圍的頭像。我們還強調了它在下游應用中的潛力,包括頭像編輯和單次拍攝頭像重建。
English
Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identities or modify existing ones. On the other hand, by learning a strong prior from data, generative models provide a promising alternative to traditional reconstruction methods, easing the time constraints for both data capture and processing. Additionally, generative methods enable downstream applications beyond reconstruction, such as editing and stylization. Nonetheless, the research on generative 3D avatars is still in its infancy, and therefore current methods still have limitations such as creating static avatars, lacking photo-realism, having incomplete facial details, or having limited drivability. To address this, we propose a text-conditioned generative model that can generate photo-realistic facial avatars of diverse identities, with more complete details like hair, eyes and mouth interior, and which can be driven through a powerful non-parametric latent expression space. Specifically, we integrate the generative and editing capabilities of latent diffusion models with a strong prior model for avatar expression driving. Our model can generate and control high-fidelity avatars, even those out-of-distribution. We also highlight its potential for downstream applications, including avatar editing and single-shot avatar reconstruction.

Summary

AI-Generated Summary

PDF183November 16, 2024