MagicTime:時間壓縮視頻生成模型作為變形模擬器
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
April 7, 2024
作者: Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
cs.AI
摘要
最近在文本轉視頻生成(T2V)領域取得了顯著進展,成功地從文字描述中合成了高質量的通用視頻。T2V 中一個被廣泛忽視的問題是現有模型未能充分編碼現實世界的物理知識,因此生成的視頻往往動作有限且變化不足。本文提出了MagicTime,一種變幻時間攝影視頻生成模型,該模型從時間攝影視頻中學習現實世界的物理知識並實現變幻生成。首先,我們設計了一個MagicAdapter方案來解耦空間和時間訓練,從變幻視頻中編碼更多物理知識,並轉換預訓練的T2V模型以生成變幻視頻。其次,我們引入了一種動態幀提取策略,以適應變幻時間攝影視頻,這些視頻具有更廣泛的變化範圍,涵蓋戲劇性的物體變幻過程,因此體現了比通用視頻更多的物理知識。最後,我們引入了一個Magic Text-Encoder來改進對變幻視頻提示的理解。此外,我們創建了一個名為ChronoMagic的時間攝影視頻文本數據集,專門為解鎖變幻視頻生成能力而精心策劃。大量實驗證明了MagicTime生成高質量和動態變幻視頻的優越性和有效性,表明時間攝影視頻生成是建立物理世界變幻模擬器的一條有前途的途徑。
English
Recent advances in Text-to-Video generation (T2V) have achieved remarkable
success in synthesizing high-quality general videos from textual descriptions.
A largely overlooked problem in T2V is that existing models have not adequately
encoded physical knowledge of the real world, thus generated videos tend to
have limited motion and poor variations. In this paper, we propose
MagicTime, a metamorphic time-lapse video generation model, which
learns real-world physics knowledge from time-lapse videos and implements
metamorphic generation. First, we design a MagicAdapter scheme to decouple
spatial and temporal training, encode more physical knowledge from metamorphic
videos, and transform pre-trained T2V models to generate metamorphic videos.
Second, we introduce a Dynamic Frames Extraction strategy to adapt to
metamorphic time-lapse videos, which have a wider variation range and cover
dramatic object metamorphic processes, thus embodying more physical knowledge
than general videos. Finally, we introduce a Magic Text-Encoder to improve the
understanding of metamorphic video prompts. Furthermore, we create a time-lapse
video-text dataset called ChronoMagic, specifically curated to unlock
the metamorphic video generation ability. Extensive experiments demonstrate the
superiority and effectiveness of MagicTime for generating high-quality and
dynamic metamorphic videos, suggesting time-lapse video generation is a
promising path toward building metamorphic simulators of the physical world.Summary
AI-Generated Summary