MagicVideo-V2:多階段高美學視頻生成
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
January 9, 2024
作者: Weimin Wang, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
cs.AI
摘要
對於從文字描述生成高保真度視頻的需求不斷增長,已在這一領域引發了重要的研究。在這項工作中,我們介紹了MagicVideo-V2,它將文本到圖像模型、視頻運動生成器、參考圖像嵌入模組和幀插值模組整合到一個端到端的視頻生成流程中。由於這些架構設計的好處,MagicVideo-V2能夠生成美觀、高分辨率的視頻,具有卓越的保真度和流暢性。通過大規模用戶評估,它展示了優於Runway、Pika 1.0、Morph、Moon Valley和Stable Video Diffusion模型等領先的文本到視頻系統的性能。
English
The growing demand for high-fidelity video generation from textual
descriptions has catalyzed significant research in this field. In this work, we
introduce MagicVideo-V2 that integrates the text-to-image model, video motion
generator, reference image embedding module and frame interpolation module into
an end-to-end video generation pipeline. Benefiting from these architecture
designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution
video with remarkable fidelity and smoothness. It demonstrates superior
performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph,
Moon Valley and Stable Video Diffusion model via user evaluation at large
scale.