Make-A-Protagonist:使用專家群進行通用視頻編輯
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
May 15, 2023
作者: Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
cs.AI
摘要
基於文本驅動的影像和影片擴散模型已經取得了前所未有的成功,能夠生成逼真且多樣化的內容。最近,基於擴散的生成模型中對現有影像和影片進行編輯和變化引起了顯著的關注。然而,先前的研究僅限於使用文本編輯內容或使用單一視覺線索提供粗略個性化,因此無法滿足需要精細和詳細控制的難以描述的內容。在這方面,我們提出了一個名為Make-A-Protagonist的通用影片編輯框架,該框架利用文本和視覺線索來編輯影片,目的是讓個人成為主角。具體來說,我們利用多個專家來解析源影片、目標視覺和文本線索,並提出了一個基於視覺和文本的影片生成模型,該模型利用遮罩引導去噪採樣來生成所需的輸出。廣泛的結果展示了Make-A-Protagonist的多才多藝和卓越的編輯能力。
English
The text-driven image and video diffusion models have achieved unprecedented
success in generating realistic and diverse content. Recently, the editing and
variation of existing images and videos in diffusion-based generative models
have garnered significant attention. However, previous works are limited to
editing content with text or providing coarse personalization using a single
visual clue, rendering them unsuitable for indescribable content that requires
fine-grained and detailed control. In this regard, we propose a generic video
editing framework called Make-A-Protagonist, which utilizes textual and visual
clues to edit videos with the goal of empowering individuals to become the
protagonists. Specifically, we leverage multiple experts to parse source video,
target visual and textual clues, and propose a visual-textual-based video
generation model that employs mask-guided denoising sampling to generate the
desired output. Extensive results demonstrate the versatile and remarkable
editing capabilities of Make-A-Protagonist.