万动:基于潜在轨迹引导的运动可控视频生成
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
December 9, 2025
作者: Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, Yujiu Yang
cs.AI
摘要
我们提出Wan-Move——一个简洁可扩展的框架,旨在为视频生成模型赋予运动控制能力。现有运动可控方法通常存在控制粒度粗糙与可扩展性有限的问题,导致其输出难以满足实际应用需求。我们通过实现精准高质量的运动控制来缩小这一差距。核心思路是直接让原始条件特征具备运动感知能力以指导视频合成。具体而言,我们首先通过密集点轨迹表征物体运动,实现对场景的细粒度控制;接着将这些轨迹映射至隐空间,并沿每条轨迹传播首帧特征,生成对齐的时空特征图以指示各场景元素的运动规律。该特征图作为更新后的隐式条件,可无缝集成至现成的图生视频模型(如Wan-I2V-14B)作为运动引导,无需改变模型架构。这一设计摒弃了辅助运动编码器,使基础模型的微调具备良好可扩展性。经规模化训练,Wan-Move生成的5秒480p视频在运动可控性上媲美Kling 1.5 Pro的商业版运动笔刷功能(用户研究证实)。为支持全面评估,我们进一步构建了MoveBench基准测试集,其通过严格筛选涵盖多样内容类别与混合验证标注,具有数据量更大、视频时长更长、运动标注质量更优的特点。在MoveBench和公开数据集上的大量实验一致表明Wan-Move具备卓越的运动质量。代码、模型及基准数据均已开源。
English
We present Wan-Move, a simple and scalable framework that brings motion control to video generative models. Existing motion-controllable methods typically suffer from coarse control granularity and limited scalability, leaving their outputs insufficient for practical use. We narrow this gap by achieving precise and high-quality motion control. Our core idea is to directly make the original condition features motion-aware for guiding video synthesis. To this end, we first represent object motions with dense point trajectories, allowing fine-grained control over the scene. We then project these trajectories into latent space and propagate the first frame's features along each trajectory, producing an aligned spatiotemporal feature map that tells how each scene element should move. This feature map serves as the updated latent condition, which is naturally integrated into the off-the-shelf image-to-video model, e.g., Wan-I2V-14B, as motion guidance without any architecture change. It removes the need for auxiliary motion encoders and makes fine-tuning base models easily scalable. Through scaled training, Wan-Move generates 5-second, 480p videos whose motion controllability rivals Kling 1.5 Pro's commercial Motion Brush, as indicated by user studies. To support comprehensive evaluation, we further design MoveBench, a rigorously curated benchmark featuring diverse content categories and hybrid-verified annotations. It is distinguished by larger data volume, longer video durations, and high-quality motion annotations. Extensive experiments on MoveBench and the public dataset consistently show Wan-Move's superior motion quality. Code, models, and benchmark data are made publicly available.