SmartDirector:基于关键帧条件的电影级视频生成与叙事节奏控制
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
May 27, 2026
作者: Zhida Zhang, Jie Ma, Zhan Peng, Haoxue Wu, Yang Han, Jun Liang, Jie Cao, Jing Li
cs.AI
摘要
视频的叙事质量从根本上决定了其感知价值。尽管现有的视频生成方法能够生成视觉上令人满意的内容,但它们主要依赖于稀疏的条件信号(如文本提示或首尾帧),这使得对叙事结构和时间节奏的精确控制受到限制。在本文中,我们提出了SmartDirector框架,通过多个关键帧增强视频生成模型的叙事能力。SmartDirector支持灵活的生成场景,包括单镜头生成、多镜头叙事合成以及视频扩展。该框架分为两个阶段:Director-Gen根据提供的关键帧生成低分辨率视频,Director-SR则利用高分辨率关键帧作为语义锚点,恢复精细细节以优化输出。为了实现稳健的多关键帧训练,我们构建了一个数据流水线,从电影中提取单镜头和多镜头序列。大量实验表明,SmartDirector显著优于现有最先进的方法。我们将发布代码以推动进一步研究。
English
The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing. In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.