MagicVideo-V2: 다단계 고미학적 비디오 생성

초록

텍스트 설명에서 고품질 비디오 생성을 위한 수요가 증가함에 따라 이 분야에서의 연구가 활발히 진행되고 있다. 본 연구에서는 텍스트-이미지 모델, 비디오 모션 생성기, 참조 이미지 임베딩 모듈, 프레임 보간 모듈을 통합한 MagicVideo-V2를 소개한다. 이러한 아키텍처 설계를 통해 MagicVideo-V2는 미학적으로 만족스러운 고해상도 비디오를 뛰어난 충실도와 부드러움으로 생성할 수 있다. 대규모 사용자 평가를 통해 Runway, Pika 1.0, Morph, Moon Valley 및 Stable Video Diffusion 모델과 같은 주요 텍스트-비디오 시스템보다 우수한 성능을 보여준다.

English

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

MagicVideo-V2: 다단계 고미학적 비디오 생성

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

초록

Support