ChatPaper.aiChatPaper

HoloCine:電影級多鏡頭長篇影片敘事的整體生成

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

October 23, 2025
作者: Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu
cs.AI

摘要

當前最先進的文字轉影片模型雖擅長生成獨立片段,卻難以創建具有連貫性的多鏡頭敘事——這正是故事敘述的核心。我們透過HoloCine模型彌合此「敘事鴻溝」,該模型能整體生成完整場景,確保從首個鏡頭到結尾的全局一致性。我們的架構透過「視窗交叉注意力機制」實現精準的導演控制,將文字提示定位至特定鏡頭;同時採用「稀疏鏡頭間自注意力模式」(鏡頭內密集連接,鏡頭間稀疏連接),確保分鐘級影片生成所需的效率。除了在敘事連貫性上樹立新標竿,HoloCine更展現出顯著的湧現能力:對角色與場景的持久記憶,以及對電影技法的直覺掌握。本研究成果標誌著從片段合成到自動化電影製作的關鍵轉變,使端到端的電影創作成為可觸及的未來。程式碼公開於:https://holo-cine.github.io/。
English
State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrative gap" with HoloCine, a model that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within shots but sparse between them) ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated filmmaking, making end-to-end cinematic creation a tangible future. Our code is available at: https://holo-cine.github.io/.
PDF397December 2, 2025