MotionPro:面向图像到视频生成的精准运动控制器
MotionPro: A Precise Motion Controller for Image-to-Video Generation
May 26, 2025
作者: Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei
cs.AI
摘要
通过交互式运动控制为图像赋予动态效果,在图像到视频(I2V)生成领域日益受到关注。现有方法通常依赖大范围高斯核扩展运动轨迹作为条件,而无需明确定义运动区域,这导致了粗糙的运动控制,并难以区分物体与相机的运动。为解决这些问题,我们提出了MotionPro,一种精确的运动控制器,创新性地利用区域轨迹和运动掩码分别调控细粒度运动合成及识别目标运动类别(即物体或相机运动)。技术上,MotionPro首先通过跟踪模型估计每段训练视频的光流图,随后采样区域轨迹以模拟推理场景。与通过大高斯核扩展光流不同,我们的区域轨迹方法直接利用局部区域内的轨迹,实现了更精确的控制,从而有效刻画了细粒度运动。同时,从预测的光流图中提取运动掩码,以捕捉运动区域的整体动态。为追求自然的运动控制,MotionPro进一步通过特征调制结合区域轨迹和运动掩码,增强了视频去噪效果。尤为值得一提的是,我们精心构建了一个包含1.1K用户标注的图像-轨迹对的基准测试集MC-Bench,用于评估细粒度及物体级别的I2V运动控制。在WebVid-10M和MC-Bench上的大量实验验证了MotionPro的有效性。更多结果请访问我们的项目页面:https://zhw-zhang.github.io/MotionPro-page/。
English
Animating images with interactive motion control has garnered popularity for
image-to-video (I2V) generation. Modern approaches typically rely on large
Gaussian kernels to extend motion trajectories as condition without explicitly
defining movement region, leading to coarse motion control and failing to
disentangle object and camera moving. To alleviate these, we present MotionPro,
a precise motion controller that novelly leverages region-wise trajectory and
motion mask to regulate fine-grained motion synthesis and identify target
motion category (i.e., object or camera moving), respectively. Technically,
MotionPro first estimates the flow maps on each training video via a tracking
model, and then samples the region-wise trajectories to simulate inference
scenario. Instead of extending flow through large Gaussian kernels, our
region-wise trajectory approach enables more precise control by directly
utilizing trajectories within local regions, thereby effectively characterizing
fine-grained movements. A motion mask is simultaneously derived from the
predicted flow maps to capture the holistic motion dynamics of the movement
regions. To pursue natural motion control, MotionPro further strengthens video
denoising by incorporating both region-wise trajectories and motion mask
through feature modulation. More remarkably, we meticulously construct a
benchmark, i.e., MC-Bench, with 1.1K user-annotated image-trajectory pairs, for
the evaluation of both fine-grained and object-level I2V motion control.
Extensive experiments conducted on WebVid-10M and MC-Bench demonstrate the
effectiveness of MotionPro. Please refer to our project page for more results:
https://zhw-zhang.github.io/MotionPro-page/.Summary
AI-Generated Summary