利用形状引导扩散进行单图像3D人体数字化

摘要

我们提出了一种方法，可以从单个输入图像生成一个具有一致、高分辨率外观的人物360度视图。NeRF及其变体通常需要来自不同视角的视频或图像。大多数现有方法接受单眼输入，要么依赖于地面真实的3D扫描来进行监督，要么缺乏3D一致性。尽管最近的3D生成模型展示了具有3D一致性的人体数字化的潜力，但这些方法在不同服装外观上泛化能力不强，结果缺乏照片级逼真度。与现有工作不同，我们利用预先训练用于一般图像合成任务的高容量2D扩散模型作为着装人体外观的先验。为了实现更好的3D一致性同时保留输入身份，我们通过修补缺失区域的外形引导扩散条件（轮廓和表面法线）逐步合成输入图像中人物的多个视角。然后通过反渲染融合这些合成的多视图图像，以获得给定人物的完全纹理高分辨率3D网格。实验证明，我们的方法优于先前方法，并可以从单个图像实现广泛服装人物的照片级360度综合，包括复杂纹理。

English

We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D consistent human digitization, these approaches do not generalize well to diverse clothing appearances, and the results lack photorealism. Unlike existing work, we utilize high-capacity 2D diffusion models pretrained for general image synthesis tasks as an appearance prior of clothed humans. To achieve better 3D consistency while retaining the input identity, we progressively synthesize multiple views of the human in the input image by inpainting missing regions with shape-guided diffusion conditioned on silhouette and surface normal. We then fuse these synthesized multi-view images via inverse rendering to obtain a fully textured high-resolution 3D mesh of the given person. Experiments show that our approach outperforms prior methods and achieves photorealistic 360-degree synthesis of a wide range of clothed humans with complex textures from a single image.

利用形状引导扩散进行单图像3D人体数字化

Single-Image 3D Human Digitization with Shape-Guided Diffusion

摘要

Support