ChatPaper.aiChatPaper

MultiShotMaster:一种可控多镜头视频生成框架

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

December 2, 2025
作者: Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia
cs.AI

摘要

当前视频生成技术擅长制作单镜头片段,但在生成叙事性多镜头视频时仍面临挑战,这类视频需要灵活的镜头调度、连贯的叙事逻辑以及超越文本提示的控制能力。为解决这些问题,我们提出MultiShotMaster——一个高度可控的多镜头视频生成框架。我们通过集成两种新型RoPE变体对预训练单镜头模型进行扩展:首先提出多镜头叙事RoPE,在镜头转场时施加显式相位偏移,在保持时序叙事连贯性的同时实现灵活镜头调度;其次设计时空位置感知RoPE,通过引入参考令牌与 grounding 信号实现时空锚定的参考信息注入。针对数据稀缺问题,我们构建了自动化标注流水线,可提取多镜头视频、描述文本、跨镜头 grounding 信号及参考图像。本框架充分利用架构内在特性,支持具备文本驱动镜头间一致性、自定义主体运动控制、背景驱动场景定制等功能的多镜头视频生成,且镜头数量与时长均可灵活配置。大量实验表明,我们的框架在生成质量和控制能力方面均展现出卓越性能。
English
Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coherent narrative, and controllability beyond text prompts. To tackle these challenges, we propose MultiShotMaster, a framework for highly controllable multi-shot video generation. We extend a pretrained single-shot model by integrating two novel variants of RoPE. First, we introduce Multi-Shot Narrative RoPE, which applies explicit phase shift at shot transitions, enabling flexible shot arrangement while preserving the temporal narrative order. Second, we design Spatiotemporal Position-Aware RoPE to incorporate reference tokens and grounding signals, enabling spatiotemporal-grounded reference injection. In addition, to overcome data scarcity, we establish an automated data annotation pipeline to extract multi-shot videos, captions, cross-shot grounding signals and reference images. Our framework leverages the intrinsic architectural properties to support multi-shot video generation, featuring text-driven inter-shot consistency, customized subject with motion control, and background-driven customized scene. Both shot count and duration are flexibly configurable. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework.
PDF492December 4, 2025