填補空白:從單眼手機捕捉實現類似工作室的頭像創建
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
July 28, 2024
作者: ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorsky, Chen Cao
cs.AI
摘要
傳統上,為個人創建逼真的頭像通常需要進行大量捕捉工作,使用複雜且昂貴的工作室設備,如LightStage系統。儘管最近在神經表示方面取得了進展,使得可以從快速手機掃描生成逼真且可動的3D頭像,但這些頭像在捕捉時的燈光效果已固定,缺乏面部細節,並在耳後等區域存在缺失。因此,與工作室捕捉的頭像相比,它們在質量上存在差距。在本文中,我們提出了一種方法,通過從短暫的單眼手機掃描生成類似工作室照明的紋理貼圖,以彌合這一差距。我們通過使用StyleGAN2的W^+空間對手機紋理貼圖進行參數化,實現幾乎完美的重建。然後,通過使用一小組工作室捕捉的紋理作為對抗訓練信號,在W^+參數化空間中對StyleGAN2進行微調。為了進一步增強面部細節的逼真度和準確性,我們通過精心設計的擴散模型對StyleGAN2的輸出進行超分辨率處理,該模型受到手機捕捉紋理貼圖的圖像梯度引導。一旦訓練完成,我們的方法在從普通單眼智能手機視頻生成類似工作室的面部紋理貼圖方面表現出色。展示了其能力,我們展示了從單眼手機掃描生成逼真、均勻照明、完整頭像的過程。{項目頁面可在此處找到:http://shahrukhathar.github.io/2024/07/22/Bridging.html}
English
Creating photorealistic avatars for individuals traditionally involves
extensive capture sessions with complex and expensive studio devices like the
LightStage system. While recent strides in neural representations have enabled
the generation of photorealistic and animatable 3D avatars from quick phone
scans, they have the capture-time lighting baked-in, lack facial details and
have missing regions in areas such as the back of the ears. Thus, they lag in
quality compared to studio-captured avatars. In this paper, we propose a method
that bridges this gap by generating studio-like illuminated texture maps from
short, monocular phone captures. We do this by parameterizing the phone texture
maps using the W^+ space of a StyleGAN2, enabling near-perfect
reconstruction. Then, we finetune a StyleGAN2 by sampling in the W^+
parameterized space using a very small set of studio-captured textures as an
adversarial training signal. To further enhance the realism and accuracy of
facial details, we super-resolve the output of the StyleGAN2 using carefully
designed diffusion model that is guided by image gradients of the
phone-captured texture map. Once trained, our method excels at producing
studio-like facial texture maps from casual monocular smartphone videos.
Demonstrating its capabilities, we showcase the generation of photorealistic,
uniformly lit, complete avatars from monocular phone captures.
http://shahrukhathar.github.io/2024/07/22/Bridging.html{The project page
can be found here.}Summary
AI-Generated Summary