MovieDreamer：用於連貫長視覺序列的分層生成

摘要

近期在影片生成方面的進展主要利用擴散模型來製作短片內容。然而，這些方法常常無法很好地建模複雜敘事，並在延長時間內保持角色一致性，這對於像電影這樣的長片製作至關重要。我們提出了MovieDreamer，這是一個新穎的階層框架，將自回歸模型的優勢與基於擴散的渲染相結合，開創了具有複雜情節進展和高視覺保真度的長時間影片生成。我們的方法利用自回歸模型來維持全局敘事一致性，預測視覺令牌序列，然後通過擴散渲染轉換為高質量的影片幀。這種方法類似於傳統電影製作過程，將複雜故事分解為可管理的場景捕捉。此外，我們使用多模態劇本，豐富場景描述，提供詳細的角色信息和視覺風格，增強了場景之間的連貫性和角色身份。我們在各種電影類型上進行了廣泛實驗，展示了我們的方法不僅實現了優越的視覺和敘事質量，還有效地將生成內容的持續時間顯著延長超越了當前的能力範圍。首頁：https://aim-uofa.github.io/MovieDreamer/。

English

Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering. This method is akin to traditional movie production processes, where complex stories are factorized down into manageable scene capturing. Further, we employ a multimodal script that enriches scene descriptions with detailed character information and visual style, enhancing continuity and character identity across scenes. We present extensive experiments across various movie genres, demonstrating that our approach not only achieves superior visual and narrative quality but also effectively extends the duration of generated content significantly beyond current capabilities. Homepage: https://aim-uofa.github.io/MovieDreamer/.

MovieDreamer：用於連貫長視覺序列的分層生成

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

摘要

Support