ChatPaper.aiChatPaper

Mora:透過多智能體框架實現通用影片生成

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

March 20, 2024
作者: Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun
cs.AI

摘要

Sora是第一個獲得社會廣泛關注的大規模通用視頻生成模型。自2024年2月由OpenAI推出以來,沒有其他視頻生成模型能夠與Sora的性能或支持廣泛視頻生成任務的能力相媲美。此外,僅有少數視頻生成模型完全公開發表,大多數為封閉源代碼。為彌補這一差距,本文提出了一個新的多智能體框架Mora,該框架整合了幾個先進的視覺人工智能智能體,以複製Sora展示的通用視頻生成能力。具體而言,Mora可以利用多個視覺智能體,成功模仿Sora在各種任務中的視頻生成能力,例如(1)文本到視頻生成,(2)文本條件下的圖像到視頻生成,(3)擴展生成的視頻,(4)視頻到視頻編輯,(5)連接視頻,以及(6)模擬數字世界。我們廣泛的實驗結果顯示,Mora在各種任務中實現了接近Sora的性能。然而,從整體上評估時,我們的工作與Sora之間存在明顯的性能差距。總之,我們希望這個項目能夠通過協作人工智能智能體引導視頻生成的未來軌跡。
English
Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.

Summary

AI-Generated Summary

PDF797December 15, 2024