ChatPaper.aiChatPaper

《电影船长:迈向短片生成》

Captain Cinema: Towards Short Movie Generation

July 24, 2025
作者: Junfei Xiao, Ceyuan Yang, Lvmin Zhang, Shengqu Cai, Yang Zhao, Yuwei Guo, Gordon Wetzstein, Maneesh Agrawala, Alan Yuille, Lu Jiang
cs.AI

摘要

我们推出Captain Cinema,一个专为短片生成设计的框架。该框架基于电影情节的详细文本描述,首先生成一系列关键帧,勾勒出整个故事的轮廓,确保故事情节与视觉呈现(如场景与角色)的长程一致性。我们称此步骤为自上而下的关键帧规划。这些关键帧随后作为条件信号,输入至支持长上下文学习的视频合成模型,以生成它们之间的时空动态。此步骤被称为自下而上的视频合成。为了支持多场景、长叙事电影作品的稳定高效生成,我们引入了针对长上下文视频数据特别优化的多模态扩散变换器(MM-DiT)的交错训练策略。我们的模型在一个精心策划的、包含交错数据对的电影数据集上进行训练。实验表明,Captain Cinema在高质量、高效率地自动生成视觉连贯、叙事一致的短片方面表现优异。项目页面:https://thecinema.ai
English
We present Captain Cinema, a generation framework for short movie generation. Given a detailed textual description of a movie storyline, our approach firstly generates a sequence of keyframes that outline the entire narrative, which ensures long-range coherence in both the storyline and visual appearance (e.g., scenes and characters). We refer to this step as top-down keyframe planning. These keyframes then serve as conditioning signals for a video synthesis model, which supports long context learning, to produce the spatio-temporal dynamics between them. This step is referred to as bottom-up video synthesis. To support stable and efficient generation of multi-scene long narrative cinematic works, we introduce an interleaved training strategy for Multimodal Diffusion Transformers (MM-DiT), specifically adapted for long-context video data. Our model is trained on a specially curated cinematic dataset consisting of interleaved data pairs. Our experiments demonstrate that Captain Cinema performs favorably in the automated creation of visually coherent and narrative consistent short movies in high quality and efficiency. Project page: https://thecinema.ai
PDF373July 25, 2025