Make-A-Protagonist:使用专家集成进行通用视频编辑
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
May 15, 2023
作者: Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
cs.AI
摘要
基于文本驱动的图像和视频扩散模型在生成逼真且多样化内容方面取得了前所未有的成功。最近,基于扩散生成模型对现有图像和视频进行编辑和变化引起了广泛关注。然而,先前的研究仅限于使用文本编辑内容或使用单个视觉线索提供粗略个性化,因此无法满足需要精细和详细控制的难以描述的内容。在这方面,我们提出了一个名为Make-A-Protagonist的通用视频编辑框架,利用文本和视觉线索编辑视频,旨在赋予个人成为主角的能力。具体而言,我们利用多个专家来解析源视频、目标视觉和文本线索,并提出了一种基于视觉文本的视频生成模型,采用基于蒙版引导去噪采样来生成期望的输出。大量结果展示了Make-A-Protagonist的多才多艺和显著的编辑能力。
English
The text-driven image and video diffusion models have achieved unprecedented
success in generating realistic and diverse content. Recently, the editing and
variation of existing images and videos in diffusion-based generative models
have garnered significant attention. However, previous works are limited to
editing content with text or providing coarse personalization using a single
visual clue, rendering them unsuitable for indescribable content that requires
fine-grained and detailed control. In this regard, we propose a generic video
editing framework called Make-A-Protagonist, which utilizes textual and visual
clues to edit videos with the goal of empowering individuals to become the
protagonists. Specifically, we leverage multiple experts to parse source video,
target visual and textual clues, and propose a visual-textual-based video
generation model that employs mask-guided denoising sampling to generate the
desired output. Extensive results demonstrate the versatile and remarkable
editing capabilities of Make-A-Protagonist.