使用者指导摄像头移动和物体运动的定制视频生成:Direct-a-Video
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
February 5, 2024
作者: Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao
cs.AI
摘要
最近的文本到视频扩散模型取得了令人瞩目的进展。在实践中,用户经常希望能够独立控制物体运动和摄像机移动,以定制视频内容。然而,当前方法缺乏对分别控制物体运动和摄像机移动的关注,这限制了文本到视频模型的可控性和灵活性。在本文中,我们介绍了一种名为Direct-a-Video的系统,允许用户独立指定一个或多个物体的运动和/或摄像机移动,就像导演一部视频一样。我们提出了一种简单而有效的策略,用于分离控制物体运动和摄像机移动。通过使用模型固有的先验知识,通过空间交叉注意力调制来控制物体运动,无需额外的优化。对于摄像机移动,我们引入了新的时间交叉注意力层,以解释定量摄像机移动参数。我们进一步采用基于增强的方法,在小规模数据集上自监督训练这些层,消除了对显式运动注释的需求。这两个组件可以独立运行,允许单独或组合控制,并且可以推广到开放域场景。大量实验证明了我们方法的优越性和有效性。项目页面:https://direct-a-video.github.io/。
English
Recent text-to-video diffusion models have achieved impressive progress. In
practice, users often desire the ability to control object motion and camera
movement independently for customized video creation. However, current methods
lack the focus on separately controlling object motion and camera movement in a
decoupled manner, which limits the controllability and flexibility of
text-to-video models. In this paper, we introduce Direct-a-Video, a system that
allows users to independently specify motions for one or multiple objects
and/or camera movements, as if directing a video. We propose a simple yet
effective strategy for the decoupled control of object motion and camera
movement. Object motion is controlled through spatial cross-attention
modulation using the model's inherent priors, requiring no additional
optimization. For camera movement, we introduce new temporal cross-attention
layers to interpret quantitative camera movement parameters. We further employ
an augmentation-based approach to train these layers in a self-supervised
manner on a small-scale dataset, eliminating the need for explicit motion
annotation. Both components operate independently, allowing individual or
combined control, and can generalize to open-domain scenarios. Extensive
experiments demonstrate the superiority and effectiveness of our method.
Project page: https://direct-a-video.github.io/.