HoloCine:電影級多鏡頭長篇影片敘事的整體生成
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
October 23, 2025
作者: Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu
cs.AI
摘要
當前最先進的文字轉影片模型雖擅長生成獨立片段,卻難以創建具有連貫性的多鏡頭敘事——這正是故事敘述的核心。我們透過HoloCine模型彌合此「敘事鴻溝」,該模型能整體生成完整場景,確保從首個鏡頭到結尾的全局一致性。我們的架構透過「視窗交叉注意力機制」實現精準的導演控制,將文字提示定位至特定鏡頭;同時採用「稀疏鏡頭間自注意力模式」(鏡頭內密集連接,鏡頭間稀疏連接),確保分鐘級影片生成所需的效率。除了在敘事連貫性上樹立新標竿,HoloCine更展現出顯著的湧現能力:對角色與場景的持久記憶,以及對電影技法的直覺掌握。本研究成果標誌著從片段合成到自動化電影製作的關鍵轉變,使端到端的電影創作成為可觸及的未來。程式碼公開於:https://holo-cine.github.io/。
English
State-of-the-art text-to-video models excel at generating isolated clips but
fall short of creating the coherent, multi-shot narratives, which are the
essence of storytelling. We bridge this "narrative gap" with HoloCine, a model
that generates entire scenes holistically to ensure global consistency from the
first shot to the last. Our architecture achieves precise directorial control
through a Window Cross-Attention mechanism that localizes text prompts to
specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within
shots but sparse between them) ensures the efficiency required for minute-scale
generation. Beyond setting a new state-of-the-art in narrative coherence,
HoloCine develops remarkable emergent abilities: a persistent memory for
characters and scenes, and an intuitive grasp of cinematic techniques. Our work
marks a pivotal shift from clip synthesis towards automated filmmaking, making
end-to-end cinematic creation a tangible future. Our code is available at:
https://holo-cine.github.io/.