MagicVideo-V2：多阶段高美学视频生成

摘要

对从文本描述生成高保真视频的需求不断增长，这在该领域引发了大量研究。在这项工作中，我们介绍了MagicVideo-V2，它将文本到图像模型、视频运动生成器、参考图像嵌入模块和帧插值模块整合到端到端视频生成管道中。借助这些架构设计，MagicVideo-V2能够生成具有美学感、高分辨率、出色保真度和流畅性的视频。通过大规模用户评估，它展现出比Runway、Pika 1.0、Morph、Moon Valley和Stable Video Diffusion模型等领先的文本到视频系统更优越的性能。

English

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

MagicVideo-V2：多阶段高美学视频生成

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

摘要

Support