ChatPaper.aiChatPaper

稳舞者:基于首帧保持的和谐连贯人体图像动画技术

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

November 24, 2025
作者: Jiaming Zhang, Shengming Cao, Rui Li, Xiaotong Zhao, Yutao Cui, Xinglin Hou, Gangshan Wu, Haolan Chen, Yu Xu, Limin Wang, Kai Ma
cs.AI

摘要

在人体图像动画中,保持首帧身份特征同时确保精确运动控制是一项核心挑战。主流参考视频生成范式中的图像-运动绑定过程忽视了实际应用中常见的时空错位问题,导致身份漂移和视觉伪影等故障。我们提出SteadyDancer框架——基于图像-视频生成范式的新型解决方案,该框架首次实现了首帧身份特征的鲁棒性保持,并能生成协调连贯的动画效果。首先,我们提出条件调和机制来协调两个相互冲突的控制条件,在保证保真度的前提下实现精确控制。其次,我们设计协同姿态调制模块,生成与参考图像高度兼容的自适应连贯姿态表征。最后,采用阶段式解耦目标训练流程,分层优化模型的运动保真度、视觉质量和时序连贯性。实验表明,SteadyDancer在外观保真度和运动控制方面均达到最先进性能,且所需训练资源显著少于同类方法。
English
Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading to failures such as identity drift and visual artifacts. We introduce SteadyDancer, an Image-to-Video (I2V) paradigm-based framework that achieves harmonized and coherent animation and is the first to ensure first-frame preservation robustly. Firstly, we propose a Condition-Reconciliation Mechanism to harmonize the two conflicting conditions, enabling precise control without sacrificing fidelity. Secondly, we design Synergistic Pose Modulation Modules to generate an adaptive and coherent pose representation that is highly compatible with the reference image. Finally, we employ a Staged Decoupled-Objective Training Pipeline that hierarchically optimizes the model for motion fidelity, visual quality, and temporal coherence. Experiments demonstrate that SteadyDancer achieves state-of-the-art performance in both appearance fidelity and motion control, while requiring significantly fewer training resources than comparable methods.
PDF392December 1, 2025