ChatPaper.aiChatPaper

CausalCine:多鏡頭影片敘事之即時自迴歸生成

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

May 12, 2026
作者: Yihao Meng, Zichen Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yue Yu, Hanlin Wang, Haobo Li, Jiapeng Zhu, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen, Huamin Qu
cs.AI

摘要

自回歸影片生成旨在實現即時、開放式的合成。然而,電影敘事並非單一場景的無限延伸,而是需要透過不斷演進的事件、視角轉換以及離散的鏡頭邊界來推進。現有的自回歸模型在此情境下往往力有未逮。它們主要針對短視野的連續生成進行訓練,將長序列視為延伸的單一鏡頭,因此在長程生成過程中無可避免地會出現動作停滯與語義偏移的問題。為了解決這個缺口,我們提出CausalCine,一個互動式自回歸框架,將多鏡頭影片生成轉化為一個線上導演過程。CausalCine能在鏡頭切換間進行因果生成,即時接受動態提示,並在不重新生成先前鏡頭的情況下重複利用上下文。為此,我們首先在原生多鏡頭序列上訓練一個因果基礎模型,使其在加速之前學會複雜的鏡頭轉換。接著,我們提出內容感知記憶路由(Content-Aware Memory Routing, CAMR),該機制根據注意力相關性分數(而非時序鄰近性)動態檢索歷史KV條目,從而在有限的活躍記憶下維持跨鏡頭的一致性。最後,我們將因果基礎模型蒸餾為一個少步生成器,以實現即時互動生成。大量實驗證明,CausalCine顯著優於自回歸基準模型,並接近雙向模型的能力,同時釋放了因果生成的串流互動性。示範網址:https://yihao-meng.github.io/CausalCine/
English
Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to learn complex shot transitions prior to acceleration. We then propose Content-Aware Memory Routing (CAMR), which dynamically retrieves historical KV entries according to attention-based relevance scores rather than temporal proximity, preserving cross-shot coherence under bounded active memory. Finally, we distill the causal base model into a few-step generator for real-time interactive generation. Extensive experiments demonstrate that CausalCine significantly outperforms autoregressive baselines and approaches the capability of bidirectional models while unlocking the streaming interactivity of causal generation. Demo available at https://yihao-meng.github.io/CausalCine/
PDF201May 14, 2026