CameraCtrl:啟用攝影機控制以進行文本到視頻生成
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
April 2, 2024
作者: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang
cs.AI
摘要
在視頻生成中,可控性扮演著至關重要的角色,因為它使用戶能夠創建所需的內容。然而,現有模型很大程度上忽視了精確控制作為表達更深層敘事細微差異的電影語言的相機姿勢。為了緩解這個問題,我們引入了CameraCtrl,實現了對文本到視頻(T2V)模型的相機姿勢進行精確控制。通過精確地對相機軌跡進行參數化,然後在T2V模型上訓練一個即插即用的相機模塊,而不影響其他部分。此外,還進行了對各種數據集影響的全面研究,表明具有不同相機分佈和相似外觀的視頻確實增強了可控性和泛化能力。實驗結果證明了CameraCtrl在實現精確和領域自適應相機控制方面的有效性,標誌著在從文本和相機姿勢輸入實現動態和定制視頻敘事的追求中邁出了一步。我們的項目網站位於: https://hehao13.github.io/projects-CameraCtrl/.
English
Controllability plays a crucial role in video generation since it allows
users to create desired content. However, existing models largely overlooked
the precise control of camera pose that serves as a cinematic language to
express deeper narrative nuances. To alleviate this issue, we introduce
CameraCtrl, enabling accurate camera pose control for text-to-video(T2V)
models. After precisely parameterizing the camera trajectory, a plug-and-play
camera module is then trained on a T2V model, leaving others untouched.
Additionally, a comprehensive study on the effect of various datasets is also
conducted, suggesting that videos with diverse camera distribution and similar
appearances indeed enhance controllability and generalization. Experimental
results demonstrate the effectiveness of CameraCtrl in achieving precise and
domain-adaptive camera control, marking a step forward in the pursuit of
dynamic and customized video storytelling from textual and camera pose inputs.
Our project website is at: https://hehao13.github.io/projects-CameraCtrl/.Summary
AI-Generated Summary