ZeroAvatar：从单张图像生成零样本3D头像

摘要

最近在文本到图像生成方面取得的进展使得零样本3D形状生成取得了显著进展。这是通过得分蒸馏实现的，这种方法利用预训练的文本到图像扩散模型来优化3D神经表示的参数，例如神经辐射场（NeRF）。尽管显示出有希望的结果，但现有方法通常无法保留复杂形状（如人体）的几何形状。为了解决这一挑战，我们提出了ZeroAvatar，这是一种方法，它在优化过程中引入了显式的3D人体先验。具体而言，我们首先从单个图像中估计和优化参数化人体，然后在优化过程中，我们使用姿态参数化人体作为额外的几何约束来规范扩散模型以及基础密度场。最后，我们提出了一个UV引导的纹理规范项，进一步引导完成不可见身体部位的纹理。我们展示了ZeroAvatar显著增强了基于优化的图像到3D头像生成的稳健性和3D一致性，优于现有的零样本图像到3D方法。

English

Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation. This is achieved by score distillation, a methodology that uses pre-trained text-to-image diffusion models to optimize the parameters of a 3D neural presentation, e.g. Neural Radiance Field (NeRF). While showing promising results, existing methods are often not able to preserve the geometry of complex shapes, such as human bodies. To address this challenge, we present ZeroAvatar, a method that introduces the explicit 3D human body prior to the optimization process. Specifically, we first estimate and refine the parameters of a parametric human body from a single image. Then during optimization, we use the posed parametric body as additional geometry constraint to regularize the diffusion model as well as the underlying density field. Lastly, we propose a UV-guided texture regularization term to further guide the completion of texture on invisible body parts. We show that ZeroAvatar significantly enhances the robustness and 3D consistency of optimization-based image-to-3D avatar generation, outperforming existing zero-shot image-to-3D methods.

ZeroAvatar：从单张图像生成零样本3D头像

ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image

摘要

Support