MagicVideo-V2：多階段高美學視頻生成

摘要

對於從文字描述生成高保真度視頻的需求不斷增長，已在這一領域引發了重要的研究。在這項工作中，我們介紹了MagicVideo-V2，它將文本到圖像模型、視頻運動生成器、參考圖像嵌入模組和幀插值模組整合到一個端到端的視頻生成流程中。由於這些架構設計的好處，MagicVideo-V2能夠生成美觀、高分辨率的視頻，具有卓越的保真度和流暢性。通過大規模用戶評估，它展示了優於Runway、Pika 1.0、Morph、Moon Valley和Stable Video Diffusion模型等領先的文本到視頻系統的性能。

English

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

MagicVideo-V2：多階段高美學視頻生成

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

摘要

Support