CausalCine：多镜头视频叙事的实时自回归生成

摘要

自回归视频生成旨在实现实时、开放式的合成。然而，电影叙事并非单一场景的无限延伸，它需要推进不断演变的事件、视角转换以及离散的镜头边界。现有自回归模型在此场景中常显不足。这些模型主要针对短程延续进行训练，将长序列视为扩展的单镜头，在长程生成过程中不可避免地出现运动停滞和语义漂移。为弥合这一差距，我们提出CausalCine——一种交互式自回归框架，将多镜头视频生成转化为在线导演过程。CausalCine在镜头切换间进行因果生成，可即时接受动态提示，并复用上下文而无需重新生成先前镜头。为实现此目标，我们首先在多镜头原生序列上训练因果基础模型，使其在加速前习得复杂的镜头转换。随后提出内容感知记忆路由（CAMR），该机制基于注意力相关性分数而非时间邻近性动态检索历史KV条目，在有限活跃记忆下保持跨镜头连贯性。最后，我们将因果基础模型蒸馏为少步生成器，实现实时交互生成。大量实验表明，CausalCine显著优于自回归基线方法，接近双向模型的能力，同时解锁了因果生成的流式交互特性。演示地址：https://yihao-meng.github.io/CausalCine/

English

Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to learn complex shot transitions prior to acceleration. We then propose Content-Aware Memory Routing (CAMR), which dynamically retrieves historical KV entries according to attention-based relevance scores rather than temporal proximity, preserving cross-shot coherence under bounded active memory. Finally, we distill the causal base model into a few-step generator for real-time interactive generation. Extensive experiments demonstrate that CausalCine significantly outperforms autoregressive baselines and approaches the capability of bidirectional models while unlocking the streaming interactivity of causal generation. Demo available at https://yihao-meng.github.io/CausalCine/

CausalCine：多镜头视频叙事的实时自回归生成

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

摘要

Support