DreamActor-M1:基於混合引導的全方位、表現力強且穩健的人像動畫技術
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
April 2, 2025
作者: Yuxuan Luo, Zhengkun Rong, Lizhen Wang, Longhao Zhang, Tianshu Hu, Yongming Zhu
cs.AI
摘要
儘管近期基於圖像的人體動畫方法已能實現逼真的身體和面部運動合成,但在細粒度整體可控性、多尺度適應性以及長期時間一致性方面仍存在關鍵差距,這導致其表現力和魯棒性較低。我們提出了一種基於擴散變換器(DiT)的框架——DreamActor-M1,並採用混合指導來克服這些限制。在運動指導方面,我們整合了隱式面部表徵、3D頭部球體和3D身體骨架的混合控制信號,實現了對面部表情和身體動作的穩健控制,同時生成富有表現力且保持身份特徵的動畫。在尺度適應方面,為應對從肖像到全身視圖的各種身體姿勢和圖像尺度,我們採用了使用不同分辨率和尺度數據的漸進式訓練策略。在外觀指導方面,我們將序列幀中的運動模式與互補的視覺參考相結合,確保在複雜運動期間對未見區域的長期時間一致性。實驗表明,我們的方法超越了現有最先進的工作,在肖像、上半身和全身生成方面提供了富有表現力的結果,並具有穩健的長期一致性。項目頁面:https://grisoon.github.io/DreamActor-M1/。
English
While recent image-based human animation methods achieve realistic body and
facial motion synthesis, critical gaps remain in fine-grained holistic
controllability, multi-scale adaptability, and long-term temporal coherence,
which leads to their lower expressiveness and robustness. We propose a
diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid
guidance to overcome these limitations. For motion guidance, our hybrid control
signals that integrate implicit facial representations, 3D head spheres, and 3D
body skeletons achieve robust control of facial expressions and body movements,
while producing expressive and identity-preserving animations. For scale
adaptation, to handle various body poses and image scales ranging from
portraits to full-body views, we employ a progressive training strategy using
data with varying resolutions and scales. For appearance guidance, we integrate
motion patterns from sequential frames with complementary visual references,
ensuring long-term temporal coherence for unseen regions during complex
movements. Experiments demonstrate that our method outperforms the
state-of-the-art works, delivering expressive results for portraits,
upper-body, and full-body generation with robust long-term consistency. Project
Page: https://grisoon.github.io/DreamActor-M1/.Summary
AI-Generated Summary