ChatPaper.aiChatPaper

MagicTime:时间推移视频生成模型作为变形模拟器

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

April 7, 2024
作者: Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
cs.AI

摘要

最近在文本到视频生成(T2V)领域取得了显著进展,成功地从文本描述中合成了高质量的通用视频。T2V中一个被大多数人忽视的问题是现有模型未能充分编码现实世界的物理知识,因此生成的视频往往运动有限且变化不足。本文提出了MagicTime,一种变形延时视频生成模型,从延时视频中学习现实世界的物理知识,并实现变形生成。首先,我们设计了一个MagicAdapter方案来解耦空间和时间训练,从变形视频中编码更多的物理知识,并转换预训练的T2V模型以生成变形视频。其次,我们引入了一种动态帧提取策略,以适应变形延时视频,这些视频具有更广泛的变化范围,并涵盖戏剧性的物体变形过程,因此体现了比通用视频更多的物理知识。最后,我们引入了一个Magic文本编码器来改善对变形视频提示的理解。此外,我们创建了一个名为ChronoMagic的延时视频文本数据集,专门策划用于释放变形视频生成能力。大量实验证明了MagicTime在生成高质量和动态变形视频方面的优越性和有效性,表明延时视频生成是通向构建物理世界变形模拟器的一个有前途的途径。
English
Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V is that existing models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose MagicTime, a metamorphic time-lapse video generation model, which learns real-world physics knowledge from time-lapse videos and implements metamorphic generation. First, we design a MagicAdapter scheme to decouple spatial and temporal training, encode more physical knowledge from metamorphic videos, and transform pre-trained T2V models to generate metamorphic videos. Second, we introduce a Dynamic Frames Extraction strategy to adapt to metamorphic time-lapse videos, which have a wider variation range and cover dramatic object metamorphic processes, thus embodying more physical knowledge than general videos. Finally, we introduce a Magic Text-Encoder to improve the understanding of metamorphic video prompts. Furthermore, we create a time-lapse video-text dataset called ChronoMagic, specifically curated to unlock the metamorphic video generation ability. Extensive experiments demonstrate the superiority and effectiveness of MagicTime for generating high-quality and dynamic metamorphic videos, suggesting time-lapse video generation is a promising path toward building metamorphic simulators of the physical world.
PDF352December 15, 2024