MagicDance:透過動作和臉部表情轉移生成逼真的人類舞蹈視頻
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
November 18, 2023
作者: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Xiao Yang, Mohammad Soleymani
cs.AI
摘要
在這份工作中,我們提出了MagicDance,一種基於擴散的模型,用於在具有挑戰性的人類舞蹈影片上進行2D人體動作和面部表情轉移。具體而言,我們旨在生成任何目標身份驅動的人類舞蹈影片,同時保持身份不變,並由新穎的姿勢序列驅動。為此,我們提出了一種兩階段訓練策略,以解開人類動作和外觀(例如面部表情、膚色和服裝)之間的關係,包括對同一數據集的人類舞蹈姿勢進行外觀控制塊的預訓練,以及對外觀-姿勢-關節控制塊進行微調。我們的新穎設計實現了對外觀的強大控制,同時保持上半身、面部特徵甚至背景的時間一致性。該模型還能夠很好地泛化到未見過的人類身份和複雜的運動序列,無需通過利用圖像擴散模型的先前知識對具有多樣人類特徵的額外數據進行微調。此外,所提出的模型易於使用,可被視為Stable Diffusion的一個插件模塊/擴展。我們還展示了該模型在零樣本2D動畫生成方面的能力,不僅實現了從一個身份到另一個身份的外觀轉移,還允許僅通過姿勢輸入實現類似卡通的風格化。大量實驗證明了我們在TikTok數據集上的優異性能。
English
In this work, we propose MagicDance, a diffusion-based model for 2D human
motion and facial expression transfer on challenging human dance videos.
Specifically, we aim to generate human dance videos of any target identity
driven by novel pose sequences while keeping the identity unchanged. To this
end, we propose a two-stage training strategy to disentangle human motions and
appearance (e.g., facial expressions, skin tone and dressing), consisting of
the pretraining of an appearance-control block and fine-tuning of an
appearance-pose-joint-control block over human dance poses of the same dataset.
Our novel design enables robust appearance control with temporally consistent
upper body, facial attributes, and even background. The model also generalizes
well on unseen human identities and complex motion sequences without the need
for any fine-tuning with additional data with diverse human attributes by
leveraging the prior knowledge of image diffusion models. Moreover, the
proposed model is easy to use and can be considered as a plug-in
module/extension to Stable Diffusion. We also demonstrate the model's ability
for zero-shot 2D animation generation, enabling not only the appearance
transfer from one identity to another but also allowing for cartoon-like
stylization given only pose inputs. Extensive experiments demonstrate our
superior performance on the TikTok dataset.