使用3D控制合成移动人员
Synthesizing Moving People with 3D Control
January 19, 2024
作者: Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
cs.AI
摘要
本文提出了一种基于扩散模型的框架,用于从单个图像为给定目标3D运动序列中的人物进行动画化。我们的方法有两个核心组成部分:a)学习关于人体和服装不可见部分的先验知识,b)呈现具有适当服装和纹理的新身体姿势。对于第一部分,我们学习了一个填充扩散模型,以从单个图像中虚拟出一个人的不可见部分。我们在纹理映射空间上训练这个模型,这使其更具样本效率,因为它对姿势和视角不变。其次,我们开发了一个基于扩散的渲染流水线,由3D人体姿势控制。这产生了人物新姿势的逼真渲染,包括服装、头发和对不可见区域的合理填充。这种解耦方法使我们的方法能够生成一系列图像,这些图像在3D姿势方面忠实于目标运动,并在视觉相似性方面忠实于输入图像。除此之外,3D控制允许各种合成摄像机轨迹来渲染一个人。我们的实验表明,与先前方法相比,我们的方法在生成持续运动和各种具有挑战性和复杂姿势方面具有韧性。请查看我们的网站以获取更多详细信息:https://boyiliee.github.io/3DHM.github.io/。
English
In this paper, we present a diffusion model-based framework for animating
people from a single image for a given target 3D motion sequence. Our approach
has two core components: a) learning priors about invisible parts of the human
body and clothing, and b) rendering novel body poses with proper clothing and
texture. For the first part, we learn an in-filling diffusion model to
hallucinate unseen parts of a person given a single image. We train this model
on texture map space, which makes it more sample-efficient since it is
invariant to pose and viewpoint. Second, we develop a diffusion-based rendering
pipeline, which is controlled by 3D human poses. This produces realistic
renderings of novel poses of the person, including clothing, hair, and
plausible in-filling of unseen regions. This disentangled approach allows our
method to generate a sequence of images that are faithful to the target motion
in the 3D pose and, to the input image in terms of visual similarity. In
addition to that, the 3D control allows various synthetic camera trajectories
to render a person. Our experiments show that our method is resilient in
generating prolonged motions and varied challenging and complex poses compared
to prior methods. Please check our website for more details:
https://boyiliee.github.io/3DHM.github.io/.