MagicDance:通过动作和面部表情转移生成逼真的人类舞蹈视频
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
November 18, 2023
作者: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Xiao Yang, Mohammad Soleymani
cs.AI
摘要
在这项工作中,我们提出了MagicDance,这是一个基于扩散的模型,用于在具有挑战性的人类舞蹈视频中进行二维人体动作和面部表情转移。具体来说,我们旨在生成任何目标身份的人类舞蹈视频,由新颖的姿势序列驱动,同时保持身份不变。为此,我们提出了一个两阶段训练策略,以解开人类动作和外观(例如面部表情、肤色和着装),包括对同一数据集的人类舞蹈姿势进行外观控制块的预训练,以及对外观-姿势-关节控制块进行微调。我们的新设计实现了对外观的强大控制,具有时间上一致的上半身、面部属性,甚至背景。该模型还能够很好地泛化到未见过的人类身份和复杂动作序列,无需通过利用图像扩散模型的先验知识,通过额外数据进行任何微调以适应具有多样人类属性的情况。此外,所提出的模型易于使用,可被视为Stable Diffusion的插件模块/扩展。我们还展示了该模型在零样本2D动画生成方面的能力,不仅实现了从一个身份到另一个身份的外观转移,还允许仅通过姿势输入进行类似卡通的风格化。大量实验证明了我们在TikTok数据集上的卓越表现。
English
In this work, we propose MagicDance, a diffusion-based model for 2D human
motion and facial expression transfer on challenging human dance videos.
Specifically, we aim to generate human dance videos of any target identity
driven by novel pose sequences while keeping the identity unchanged. To this
end, we propose a two-stage training strategy to disentangle human motions and
appearance (e.g., facial expressions, skin tone and dressing), consisting of
the pretraining of an appearance-control block and fine-tuning of an
appearance-pose-joint-control block over human dance poses of the same dataset.
Our novel design enables robust appearance control with temporally consistent
upper body, facial attributes, and even background. The model also generalizes
well on unseen human identities and complex motion sequences without the need
for any fine-tuning with additional data with diverse human attributes by
leveraging the prior knowledge of image diffusion models. Moreover, the
proposed model is easy to use and can be considered as a plug-in
module/extension to Stable Diffusion. We also demonstrate the model's ability
for zero-shot 2D animation generation, enabling not only the appearance
transfer from one identity to another but also allowing for cartoon-like
stylization given only pose inputs. Extensive experiments demonstrate our
superior performance on the TikTok dataset.