CineMaster:一個具有3D感知和可控制性的電影式文本到視頻生成框架
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
February 12, 2025
作者: Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai
cs.AI
摘要
在這份工作中,我們提出了 CineMaster,一個新穎的框架,用於具備 3D 意識和可控性的文本到視頻生成。我們的目標是賦予用戶與專業電影導演可比擬的可控性:在場景中精確放置物體、在 3D 空間中靈活操作物體和相機,以及直觀控制渲染幀的佈局。為實現這一目標,CineMaster 分為兩個階段。在第一階段,我們設計了一個交互式工作流程,讓用戶通過定位物體邊界框和定義相機在 3D 空間中的移動,直觀地構建 3D 意識的條件信號。在第二階段,這些控制信號——包括渲染的深度圖、相機軌跡和物體類別標籤——作為文本到視頻擴散模型的指導,確保生成用戶期望的視頻內容。此外,為了克服野外數據集中缺乏 3D 物體運動和相機姿態標註的問題,我們精心建立了一個自動化數據標註流程,從大規模視頻數據中提取 3D 邊界框和相機軌跡。廣泛的定性和定量實驗表明,CineMaster 在性能上顯著優於現有方法,實現了卓越的 3D 意識文本到視頻生成。項目頁面:https://cinemaster-dev.github.io/。
English
In this work, we present CineMaster, a novel framework for 3D-aware and
controllable text-to-video generation. Our goal is to empower users with
comparable controllability as professional film directors: precise placement of
objects within the scene, flexible manipulation of both objects and camera in
3D space, and intuitive layout control over the rendered frames. To achieve
this, CineMaster operates in two stages. In the first stage, we design an
interactive workflow that allows users to intuitively construct 3D-aware
conditional signals by positioning object bounding boxes and defining camera
movements within the 3D space. In the second stage, these control
signals--comprising rendered depth maps, camera trajectories and object class
labels--serve as the guidance for a text-to-video diffusion model, ensuring to
generate the user-intended video content. Furthermore, to overcome the scarcity
of in-the-wild datasets with 3D object motion and camera pose annotations, we
carefully establish an automated data annotation pipeline that extracts 3D
bounding boxes and camera trajectories from large-scale video data. Extensive
qualitative and quantitative experiments demonstrate that CineMaster
significantly outperforms existing methods and implements prominent 3D-aware
text-to-video generation. Project page: https://cinemaster-dev.github.io/.Summary
AI-Generated Summary