DreamActor-M1:基于混合引导的全方位、富有表现力且鲁棒的人体图像动画生成
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
April 2, 2025
作者: Yuxuan Luo, Zhengkun Rong, Lizhen Wang, Longhao Zhang, Tianshu Hu, Yongming Zhu
cs.AI
摘要
尽管近期基于图像的人体动画方法已能实现逼真的身体与面部运动合成,但在细粒度整体可控性、多尺度适应性以及长期时间连贯性方面仍存在显著不足,这导致其表现力与鲁棒性较低。为此,我们提出了一个基于扩散变换器(DiT)的框架——DreamActor-M1,通过混合指导机制来克服这些局限。在运动指导方面,我们融合了隐式面部表征、3D头部球体及3D身体骨架的混合控制信号,实现了对面部表情与身体动作的稳健控制,同时生成富有表现力且保持身份特征的动画。针对尺度适应,为应对从肖像到全身视图等多种身体姿态与图像尺度,我们采用了基于不同分辨率与尺度数据的渐进式训练策略。在外观指导上,我们将序列帧中的运动模式与补充视觉参考相结合,确保在复杂运动中不可见区域的长期时间连贯性。实验表明,我们的方法超越了现有最先进技术,在肖像、上半身及全身生成方面均展现出卓越的表现力,并具备稳健的长期一致性。项目页面:https://grisoon.github.io/DreamActor-M1/。
English
While recent image-based human animation methods achieve realistic body and
facial motion synthesis, critical gaps remain in fine-grained holistic
controllability, multi-scale adaptability, and long-term temporal coherence,
which leads to their lower expressiveness and robustness. We propose a
diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid
guidance to overcome these limitations. For motion guidance, our hybrid control
signals that integrate implicit facial representations, 3D head spheres, and 3D
body skeletons achieve robust control of facial expressions and body movements,
while producing expressive and identity-preserving animations. For scale
adaptation, to handle various body poses and image scales ranging from
portraits to full-body views, we employ a progressive training strategy using
data with varying resolutions and scales. For appearance guidance, we integrate
motion patterns from sequential frames with complementary visual references,
ensuring long-term temporal coherence for unseen regions during complex
movements. Experiments demonstrate that our method outperforms the
state-of-the-art works, delivering expressive results for portraits,
upper-body, and full-body generation with robust long-term consistency. Project
Page: https://grisoon.github.io/DreamActor-M1/.Summary
AI-Generated Summary