穩健舞者:具首幀保留功能的協調連貫人體影像動畫技術
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
November 24, 2025
作者: Jiaming Zhang, Shengming Cao, Rui Li, Xiaotong Zhao, Yutao Cui, Xinglin Hou, Gangshan Wu, Haolan Chen, Yu Xu, Limin Wang, Kai Ma
cs.AI
摘要
在人像動畫領域中,如何保持首幀身份特徵同時實現精確動作控制是一項根本性挑戰。當前主流的參考影片生成範式存在圖像-動作綁定過程的缺陷,未能解決實際應用中常見的時空錯位問題,導致身份特徵漂移與視覺偽影等故障。本文提出SteadyDancer——基於圖像到影片生成範式的創新框架,該框架不僅實現和諧連貫的動畫效果,更成為首個能穩健保證首幀特徵保留的解決方案。首先,我們設計條件調和機制來協調兩種衝突的控制條件,在保持特徵保真度的前提下實現精確動作控制。其次,通過協同姿態調制模組生成具有高度圖像適應性的連貫姿態表徵。最後採用分階段解耦目標訓練流程,分層優化模型的動作擬真度、視覺品質與時序連貫性。實驗表明,SteadyDancer在表徵保真度與動作控制方面均達到最先進水平,且所需訓練資源顯著少於同類方法。
English
Preserving first-frame identity while ensuring precise motion control is a fundamental challenge in human image animation. The Image-to-Motion Binding process of the dominant Reference-to-Video (R2V) paradigm overlooks critical spatio-temporal misalignments common in real-world applications, leading to failures such as identity drift and visual artifacts. We introduce SteadyDancer, an Image-to-Video (I2V) paradigm-based framework that achieves harmonized and coherent animation and is the first to ensure first-frame preservation robustly. Firstly, we propose a Condition-Reconciliation Mechanism to harmonize the two conflicting conditions, enabling precise control without sacrificing fidelity. Secondly, we design Synergistic Pose Modulation Modules to generate an adaptive and coherent pose representation that is highly compatible with the reference image. Finally, we employ a Staged Decoupled-Objective Training Pipeline that hierarchically optimizes the model for motion fidelity, visual quality, and temporal coherence. Experiments demonstrate that SteadyDancer achieves state-of-the-art performance in both appearance fidelity and motion control, while requiring significantly fewer training resources than comparable methods.