ChatPaper.aiChatPaper

EasyAnimate:基於Transformer架構的高性能長視頻生成方法

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

May 29, 2024
作者: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang
cs.AI

摘要

本文介紹了EasyAnimate,一種先進的影片生成方法,利用Transformer架構的強大功能來獲得高性能的結果。我們擴展了最初為2D圖像合成設計的DiT框架,以容納3D影片生成的複雜性,並加入了運動模塊塊。該模塊用於捕捉時間動態,從而確保生成一致的幀和無縫的運動過渡。運動模塊可以適應各種DiT基準方法,以生成不同風格的影片。它還可以在訓練和推斷階段生成具有不同幀率和分辨率的影片,適用於圖像和影片。此外,我們引入了切片VAE,一種壓縮時間軸的新方法,有助於生成長時間影片。目前,EasyAnimate展現了生成具有144幀影片的能力。我們提供了基於DiT的影片生成的全面生態系統,包括數據預處理、VAE訓練、DiT模型訓練(基準模型和LoRA模型)、以及端到端的影片推斷。代碼可在以下網址找到:https://github.com/aigc-apps/EasyAnimate。我們將持續努力提升我們方法的性能。
English
This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the production of consistent frames and seamless motion transitions. The motion module can be adapted to various DiT baseline methods to generate video with different styles. It can also generate videos with different frame rates and resolutions during both training and inference phases, suitable for both images and videos. Moreover, we introduce slice VAE, a novel approach to condense the temporal axis, facilitating the generation of long duration videos. Currently, EasyAnimate exhibits the proficiency to generate videos with 144 frames. We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training (both the baseline model and LoRA model), and end-to-end video inference. Code is available at: https://github.com/aigc-apps/EasyAnimate. We are continuously working to enhance the performance of our method.

Summary

AI-Generated Summary

PDF121December 12, 2024