ChatPaper.aiChatPaper

电影梦想家:用于连贯长视觉序列的分层生成

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

July 23, 2024
作者: Canyu Zhao, Mingyu Liu, Wen Wang, Jianlong Yuan, Hao Chen, Bo Zhang, Chunhua Shen
cs.AI

摘要

最近视频生成方面的进展主要利用扩散模型来生成短时内容。然而,这些方法通常在建模复杂叙事和在长时间内保持角色一致性方面表现不佳,而这对于电影等长篇视频制作至关重要。我们提出了MovieDreamer,这是一个新颖的分层框架,将自回归模型的优势与基于扩散的渲染相结合,开创了具有复杂情节发展和高视觉保真度的长时间视频生成。我们的方法利用自回归模型实现全局叙事连贯性,预测一系列视觉令牌,随后通过扩散渲染转换为高质量视频帧。这种方法类似于传统电影制作过程,其中复杂故事被分解为可管理的场景捕捉。此外,我们采用多模态脚本,丰富场景描述,详细角色信息和视觉风格,增强了场景之间的连续性和角色身份。我们展示了跨多种电影类型的广泛实验,表明我们的方法不仅实现了优越的视觉和叙事质量,而且有效地将生成内容的持续时间显著延长到目前能力之外。主页:https://aim-uofa.github.io/MovieDreamer/。
English
Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering. This method is akin to traditional movie production processes, where complex stories are factorized down into manageable scene capturing. Further, we employ a multimodal script that enriches scene descriptions with detailed character information and visual style, enhancing continuity and character identity across scenes. We present extensive experiments across various movie genres, demonstrating that our approach not only achieves superior visual and narrative quality but also effectively extends the duration of generated content significantly beyond current capabilities. Homepage: https://aim-uofa.github.io/MovieDreamer/.

Summary

AI-Generated Summary

PDF312November 28, 2024