使用3D控制合成移動人物
Synthesizing Moving People with 3D Control
January 19, 2024
作者: Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
cs.AI
摘要
本文提出了一個基於擴散模型的框架,用於從單張圖像為給定目標3D運動序列的人物進行動畫。我們的方法有兩個核心組件:a)學習有關人體不可見部分和服裝的先驗知識,以及b)呈現具有適當服裝和紋理的新身體姿勢。對於第一部分,我們學習一個填充擴散模型,以幻想給定單張圖像中人物的不可見部分。我們在紋理映射空間上訓練這個模型,這使其更具樣本效率,因為它對姿勢和視角不變。其次,我們開發了一個基於擴散的渲染流水線,由3D人體姿勢控制。這產生了人物新姿勢的逼真渲染,包括服裝、頭髮和不可見區域的合理填充。這種分離的方法使我們的方法能夠生成一系列圖像,這些圖像在3D姿勢方面忠實於目標運動,並在視覺相似性方面忠實於輸入圖像。除此之外,3D控制還允許各種合成相機軌跡來渲染人物。我們的實驗表明,與先前方法相比,我們的方法在生成持續運動和各種具有挑戰性和複雜姿勢方面具有韌性。請查看我們的網站以獲取更多詳細信息:https://boyiliee.github.io/3DHM.github.io/。
English
In this paper, we present a diffusion model-based framework for animating
people from a single image for a given target 3D motion sequence. Our approach
has two core components: a) learning priors about invisible parts of the human
body and clothing, and b) rendering novel body poses with proper clothing and
texture. For the first part, we learn an in-filling diffusion model to
hallucinate unseen parts of a person given a single image. We train this model
on texture map space, which makes it more sample-efficient since it is
invariant to pose and viewpoint. Second, we develop a diffusion-based rendering
pipeline, which is controlled by 3D human poses. This produces realistic
renderings of novel poses of the person, including clothing, hair, and
plausible in-filling of unseen regions. This disentangled approach allows our
method to generate a sequence of images that are faithful to the target motion
in the 3D pose and, to the input image in terms of visual similarity. In
addition to that, the 3D control allows various synthetic camera trajectories
to render a person. Our experiments show that our method is resilient in
generating prolonged motions and varied challenging and complex poses compared
to prior methods. Please check our website for more details:
https://boyiliee.github.io/3DHM.github.io/.