ChatPaper.aiChatPaper

Motion-I2V:具有显式运动建模的一致可控图像到视频生成

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

January 29, 2024
作者: Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Da, Hongsheng Li
cs.AI

摘要

我们介绍了Motion-I2V,这是一个新颖的框架,用于一致且可控的图像到视频生成(I2V)。与直接学习复杂的图像到视频映射的先前方法不同,Motion-I2V将I2V分解为两个阶段,并引入了显式运动建模。在第一阶段,我们提出了基于扩散的运动场预测器,重点是推断参考图像像素的轨迹。在第二阶段,我们提出了运动增强的时间注意力,以增强视频潜在扩散模型中有限的一维时间注意力。该模块可以在第一阶段预测的轨迹指导下,有效地将参考图像的特征传播到合成帧。与现有方法相比,Motion-I2V即使在存在大运动和视角变化的情况下,也能生成更一致的视频。通过为第一阶段训练稀疏轨迹ControlNet,Motion-I2V可以支持用户通过稀疏轨迹和区域注释精确控制运动轨迹和运动区域。这比仅依赖文本指令具有更多的I2V过程可控性。此外,Motion-I2V的第二阶段自然支持零样本视频到视频翻译。定性和定量比较表明,Motion-I2V在一致且可控的图像到视频生成方面优于先前方法。
English
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels. For the second stage, we propose motion-augmented temporal attention to enhance the limited 1-D temporal attention in video latent diffusion models. This module can effectively propagate reference image's feature to synthesized frames with the guidance of predicted trajectories from the first stage. Compared with existing methods, Motion-I2V can generate more consistent videos even at the presence of large motion and viewpoint variation. By training a sparse trajectory ControlNet for the first stage, Motion-I2V can support users to precisely control motion trajectories and motion regions with sparse trajectory and region annotations. This offers more controllability of the I2V process than solely relying on textual instructions. Additionally, Motion-I2V's second stage naturally supports zero-shot video-to-video translation. Both qualitative and quantitative comparisons demonstrate the advantages of Motion-I2V over prior approaches in consistent and controllable image-to-video generation.
PDF408December 15, 2024