ChatPaper.aiChatPaper

CameraCtrl:赋能文本到视频生成中的相机控制

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

April 2, 2024
作者: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang
cs.AI

摘要

可控性在视频生成中起着至关重要的作用,因为它使用户能够创作出所需的内容。然而,现有模型在很大程度上忽视了作为电影语言的摄像机姿态的精确控制,这种控制能够表达更深层次的叙事细微差别。为解决这一问题,我们引入了CameraCtrl,它能够为文本到视频(T2V)模型提供精确的摄像机姿态控制。在精确参数化摄像机轨迹后,一个即插即用的摄像机模块随后在T2V模型上进行训练,而其他部分保持不变。此外,我们还进行了关于不同数据集影响的综合研究,结果表明,具有多样摄像机分布和相似外观的视频确实能增强可控性和泛化能力。实验结果显示,CameraCtrl在实现精确且适应领域的摄像机控制方面效果显著,标志着我们在从文本和摄像机姿态输入追求动态和定制化视频叙事方面迈出了重要一步。我们的项目网站位于:https://hehao13.github.io/projects-CameraCtrl/。
English
Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely parameterizing the camera trajectory, a plug-and-play camera module is then trained on a T2V model, leaving others untouched. Additionally, a comprehensive study on the effect of various datasets is also conducted, suggesting that videos with diverse camera distribution and similar appearances indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise and domain-adaptive camera control, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. Our project website is at: https://hehao13.github.io/projects-CameraCtrl/.

Summary

AI-Generated Summary

PDF251November 26, 2024