协作式视频传播:具有摄像头控制的一致多视频生成
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
May 27, 2024
作者: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein
cs.AI
摘要
最近视频生成的研究取得了巨大进展,使得可以从文本提示或图像生成高质量视频。为视频生成过程添加控制是未来的重要目标,最近的方法在视频生成模型上加入摄像机轨迹条件取得了进展。然而,从多个不同摄像机轨迹生成同一场景的视频仍然具有挑战性。解决这个多视频生成问题可以实现大规模的可编辑摄像机轨迹的3D场景生成,以及其他应用。我们引入了协作视频扩散(CVD)作为实现这一愿景的重要一步。CVD框架包括一个新颖的跨视频同步模块,通过一个极线注意机制促进从不同摄像机姿势渲染的同一视频对应帧之间的一致性。在基于最先进的摄像机控制模块进行视频生成的基础上训练,CVD生成了从不同摄像机轨迹渲染的多个视频,其一致性明显优于基线,如广泛实验证明。项目页面:https://collaborativevideodiffusion.github.io/。
English
Research on video generation has recently made tremendous progress, enabling
high-quality videos to be generated from text prompts or images. Adding control
to the video generation process is an important goal moving forward and recent
approaches that condition video generation models on camera trajectories make
strides towards it. Yet, it remains challenging to generate a video of the same
scene from multiple different camera trajectories. Solutions to this
multi-video generation problem could enable large-scale 3D scene generation
with editable camera trajectories, among other applications. We introduce
collaborative video diffusion (CVD) as an important step towards this vision.
The CVD framework includes a novel cross-video synchronization module that
promotes consistency between corresponding frames of the same video rendered
from different camera poses using an epipolar attention mechanism. Trained on
top of a state-of-the-art camera-control module for video generation, CVD
generates multiple videos rendered from different camera trajectories with
significantly better consistency than baselines, as shown in extensive
experiments. Project page: https://collaborativevideodiffusion.github.io/.