弥合差距:从单目手机捕捉实现类似工作室的阿凡达创建
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
July 28, 2024
作者: ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorsky, Chen Cao
cs.AI
摘要
为个人创建逼真的头像传统上需要使用复杂昂贵的设备,如LightStage系统进行大量捕捉会话。尽管最近神经表示方面取得了进展,使得可以从快速手机扫描生成逼真且可动画的3D头像,但这些头像在捕捉时的光照已固定,缺乏面部细节,并且在耳朵后方等区域存在缺失。因此,它们在质量上不及工作室捕捉的头像。在本文中,我们提出了一种方法,通过从短暂的单眼手机捕捉生成类似工作室的照明纹理贴图,以弥合这一差距。我们通过使用StyleGAN2的W^+空间对手机纹理贴图进行参数化,实现接近完美的重建。然后,我们通过对W^+参数化空间进行采样,使用少量工作室捕捉的纹理作为对抗性训练信号,对StyleGAN2进行微调。为了进一步增强面部细节的逼真度和准确性,我们通过精心设计的扩散模型对StyleGAN2的输出进行超分辨率处理,该模型受手机捕捉的纹理贴图的图像梯度引导。一旦训练完成,我们的方法在从普通单眼智能手机视频中生成类似工作室的面部纹理贴图方面表现出色。展示了其能力,我们展示了从单眼手机捕捉生成逼真、均匀照明、完整头像的生成过程。您可以在此处找到项目页面:http://shahrukhathar.github.io/2024/07/22/Bridging.html。
English
Creating photorealistic avatars for individuals traditionally involves
extensive capture sessions with complex and expensive studio devices like the
LightStage system. While recent strides in neural representations have enabled
the generation of photorealistic and animatable 3D avatars from quick phone
scans, they have the capture-time lighting baked-in, lack facial details and
have missing regions in areas such as the back of the ears. Thus, they lag in
quality compared to studio-captured avatars. In this paper, we propose a method
that bridges this gap by generating studio-like illuminated texture maps from
short, monocular phone captures. We do this by parameterizing the phone texture
maps using the W^+ space of a StyleGAN2, enabling near-perfect
reconstruction. Then, we finetune a StyleGAN2 by sampling in the W^+
parameterized space using a very small set of studio-captured textures as an
adversarial training signal. To further enhance the realism and accuracy of
facial details, we super-resolve the output of the StyleGAN2 using carefully
designed diffusion model that is guided by image gradients of the
phone-captured texture map. Once trained, our method excels at producing
studio-like facial texture maps from casual monocular smartphone videos.
Demonstrating its capabilities, we showcase the generation of photorealistic,
uniformly lit, complete avatars from monocular phone captures.
http://shahrukhathar.github.io/2024/07/22/Bridging.html{The project page
can be found here.}Summary
AI-Generated Summary