ChatPaper.aiChatPaper

Wan-Move:基於潛在軌跡引導的運動可控影片生成

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

December 9, 2025
作者: Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, Yujiu Yang
cs.AI

摘要

我們提出Wan-Move——一個簡潔可擴展的框架,旨在為影片生成模型賦予運動控制能力。現有的運動可控方法普遍存在控制粒度粗糙與可擴展性有限的問題,導致其輸出難以滿足實際應用需求。我們通過實現精確且高質量的運動控制來縮小這一差距。其核心思路是直接使原始條件特徵具備運動感知能力,從而指導影片合成。為此,我們首先通過密集點軌跡表徵物體運動,實現對場景的細粒度控制;接著將這些軌跡映射至潛在空間,並沿每條軌跡傳播首幀特徵,生成對齊的時空特徵圖來指示各場景元素的運動路徑。該特徵圖作為更新後的潛在條件,可無縫集成至現成的圖像轉影片模型(如Wan-I2V-14B)中作為運動指導,無需改變模型架構。此設計無需輔助運動編碼器,並使基礎模型的微調具備高度可擴展性。經規模化訓練後,Wan-Move可生成5秒鐘480p解析度的影片,用戶研究表明其運動控制能力可媲美Kling 1.5 Pro的商用運動畫筆功能。為支持全面評估,我們進一步設計了MoveBench基準測試集,該數據集經過嚴格篩選,涵蓋多樣化內容類別並採用混合驗證標註,具有數據量更大、影片時長更長、運動標註質量高等特點。在MoveBench與公開數據集上的大量實驗一致表明Wan-Move具備卓越的運動質量。相關代碼、模型及基準數據均已開源。
English
We present Wan-Move, a simple and scalable framework that brings motion control to video generative models. Existing motion-controllable methods typically suffer from coarse control granularity and limited scalability, leaving their outputs insufficient for practical use. We narrow this gap by achieving precise and high-quality motion control. Our core idea is to directly make the original condition features motion-aware for guiding video synthesis. To this end, we first represent object motions with dense point trajectories, allowing fine-grained control over the scene. We then project these trajectories into latent space and propagate the first frame's features along each trajectory, producing an aligned spatiotemporal feature map that tells how each scene element should move. This feature map serves as the updated latent condition, which is naturally integrated into the off-the-shelf image-to-video model, e.g., Wan-I2V-14B, as motion guidance without any architecture change. It removes the need for auxiliary motion encoders and makes fine-tuning base models easily scalable. Through scaled training, Wan-Move generates 5-second, 480p videos whose motion controllability rivals Kling 1.5 Pro's commercial Motion Brush, as indicated by user studies. To support comprehensive evaluation, we further design MoveBench, a rigorously curated benchmark featuring diverse content categories and hybrid-verified annotations. It is distinguished by larger data volume, longer video durations, and high-quality motion annotations. Extensive experiments on MoveBench and the public dataset consistently show Wan-Move's superior motion quality. Code, models, and benchmark data are made publicly available.
PDF932December 11, 2025