CamCo:可控相机的三维一致图像到视频生成
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
June 4, 2024
作者: Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat
cs.AI
摘要
最近,视频扩散模型作为表达丰富、高质量视频内容创作的生成工具崭露头角,普通用户可以轻松获得这些工具。然而,这些模型通常无法精确控制视频生成中的摄像机姿势,限制了电影语言和用户控制的表达。为解决这一问题,我们引入了CamCo,该模型允许对图像到视频生成进行细粒度的摄像机姿势控制。我们使用Pl\"ucker坐标为经过预训练的图像到视频生成器提供准确参数化的摄像机姿势输入。为增强生成视频的三维一致性,我们在每个注意力块中集成了一个极线注意力模块,强制执行特征图上的极线约束。此外,我们通过结构运动算法估算的摄像机姿势在真实世界视频上对CamCo进行微调,以更好地合成物体运动。我们的实验表明,与先前模型相比,CamCo显著提高了三维一致性和摄像机控制能力,同时有效地生成了可信的物体运动。项目页面:https://ir1d.github.io/CamCo/
English
Recently video diffusion models have emerged as expressive generative tools
for high-quality video content creation readily available to general users.
However, these models often do not offer precise control over camera poses for
video generation, limiting the expression of cinematic language and user
control. To address this issue, we introduce CamCo, which allows fine-grained
Camera pose Control for image-to-video generation. We equip a pre-trained
image-to-video generator with accurately parameterized camera pose input using
Pl\"ucker coordinates. To enhance 3D consistency in the videos produced, we
integrate an epipolar attention module in each attention block that enforces
epipolar constraints to the feature maps. Additionally, we fine-tune CamCo on
real-world videos with camera poses estimated through structure-from-motion
algorithms to better synthesize object motion. Our experiments show that CamCo
significantly improves 3D consistency and camera control capabilities compared
to previous models while effectively generating plausible object motion.
Project page: https://ir1d.github.io/CamCo/Summary
AI-Generated Summary