ChatPaper.aiChatPaper

Mora:通过多智能体框架实现通用视频生成

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

March 20, 2024
作者: Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun
cs.AI

摘要

Sora是第一个引起社会广泛关注的大规模通用视频生成模型。自2024年2月由OpenAI推出以来,没有其他视频生成模型能够与Sora的性能或支持广泛视频生成任务的能力相媲美。此外,已完全公开发表的视频生成模型很少,大多数是闭源的。为填补这一空白,本文提出了一个新的多智能体框架Mora,该框架整合了几个先进的视觉人工智能智能体,以复制Sora展示的通用视频生成能力。具体而言,Mora可以利用多个视觉智能体,并成功模仿Sora在各种任务中的视频生成能力,如(1)文本到视频生成,(2)文本条件图像到视频生成,(3)扩展生成的视频,(4)视频到视频编辑,(5)连接视频和(6)模拟数字世界。我们广泛的实验结果表明,Mora在各种任务中实现了接近Sora的性能。然而,从整体上评估时,我们的工作与Sora之间存在明显的性能差距。总之,我们希望这个项目能够通过协作人工智能智能体指导未来视频生成的发展方向。
English
Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.

Summary

AI-Generated Summary

PDF797December 15, 2024