DynaVid:基于合成运动数据的高动态视频生成方法研究
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data
April 2, 2026
作者: Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho
cs.AI
摘要
尽管近期取得进展,视频扩散模型在合成涉及剧烈动态运动或需要细粒度运动控制度的真实视频时仍面临挑战。核心限制在于常用训练数据集中此类样本的稀缺性。为此,我们推出DynaVid视频合成框架,该框架通过计算机图形管线渲染的光流形式利用合成运动数据进行训练。该方法具有两大优势:首先,合成运动能提供真实数据难以获取的多样化运动模式和精确控制信号;其次,与具有人工外观的渲染视频不同,渲染光流仅编码运动信息且与外观解耦,从而避免模型重现合成视频的不自然观感。基于此思路,DynaVid采用两阶段生成框架:运动生成器先合成运动模式,再由运动引导的视频生成器根据运动条件生成视频帧。这种解耦设计使模型既能从合成数据学习动态运动模式,又能保持真实世界视频的视觉逼真度。我们在现有数据集特别受限的两个挑战性场景(剧烈人体运动生成与极端摄像机运动控制)上验证了框架有效性。大量实验表明,DynaVid在动态运动生成与摄像机运动控制的真实感和可控性方面均有显著提升。
English
Despite recent progress, video diffusion models still struggle to synthesize realistic videos involving highly dynamic motions or requiring fine-grained motion controllability. A central limitation lies in the scarcity of such examples in commonly used training datasets. To address this, we introduce DynaVid, a video synthesis framework that leverages synthetic motion data in training, which is represented as optical flow and rendered using computer graphics pipelines. This approach offers two key advantages. First, synthetic motion offers diverse motion patterns and precise control signals that are difficult to obtain from real data. Second, unlike rendered videos with artificial appearances, rendered optical flow encodes only motion and is decoupled from appearance, thereby preventing models from reproducing the unnatural look of synthetic videos. Building on this idea, DynaVid adopts a two-stage generation framework: a motion generator first synthesizes motion, and then a motion-guided video generator produces video frames conditioned on that motion. This decoupled formulation enables the model to learn dynamic motion patterns from synthetic data while preserving visual realism from real-world videos. We validate our framework on two challenging scenarios, vigorous human motion generation and extreme camera motion control, where existing datasets are particularly limited. Extensive experiments demonstrate that DynaVid improves the realism and controllability in dynamic motion generation and camera motion control.