利用形狀引導擴散進行單張圖像的3D人體數字化

摘要

我們提出了一種方法，可以從單張輸入圖像生成一個具有一致、高分辨率外觀的人物360度視圖。NeRF及其變體通常需要來自不同視角的視頻或圖像。大多數現有方法接受單眼輸入，要麼依賴於地面真實的3D掃描來進行監督，要麼缺乏3D一致性。雖然最近的3D生成模型展示了3D一致的人體數字化的潛力，但這些方法對於不同服裝外觀的泛化能力不佳，且結果缺乏逼真感。與現有工作不同，我們利用了預先訓練用於一般圖像合成任務的高容量2D擴散模型作為穿著衣物的人類外觀先驗。為了實現更好的3D一致性並保留輸入身份，我們通過對輪廓和表面法向條件下的形狀引導擴散來逐步合成輸入圖像中人物的多個視圖，以修補缺失區域。然後，我們通過反向渲染將這些合成的多視圖圖像融合，以獲得給定人物的完全紋理化的高分辨率3D網格。實驗表明，我們的方法優於先前的方法，可以從單張圖像實現廣泛服裝人物的逼真360度合成，包括複雜紋理。

English

We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D consistent human digitization, these approaches do not generalize well to diverse clothing appearances, and the results lack photorealism. Unlike existing work, we utilize high-capacity 2D diffusion models pretrained for general image synthesis tasks as an appearance prior of clothed humans. To achieve better 3D consistency while retaining the input identity, we progressively synthesize multiple views of the human in the input image by inpainting missing regions with shape-guided diffusion conditioned on silhouette and surface normal. We then fuse these synthesized multi-view images via inverse rendering to obtain a fully textured high-resolution 3D mesh of the given person. Experiments show that our approach outperforms prior methods and achieves photorealistic 360-degree synthesis of a wide range of clothed humans with complex textures from a single image.

利用形狀引導擴散進行單張圖像的3D人體數字化

Single-Image 3D Human Digitization with Shape-Guided Diffusion

摘要

Support