MagicVideo-V2:多阶段高美学视频生成

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

January 9, 2024
作者: Weimin Wang, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
cs.AI

摘要

对从文本描述生成高保真视频的需求不断增长,这在该领域引发了大量研究。在这项工作中,我们介绍了MagicVideo-V2,它将文本到图像模型、视频运动生成器、参考图像嵌入模块和帧插值模块整合到端到端视频生成管道中。借助这些架构设计,MagicVideo-V2能够生成具有美学感、高分辨率、出色保真度和流畅性的视频。通过大规模用户评估,它展现出比Runway、Pika 1.0、Morph、Moon Valley和Stable Video Diffusion模型等领先的文本到视频系统更优越的性能。
English
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.
PDF506December 15, 2024