Make-A-Protagonist：使用專家群進行通用視頻編輯

摘要

基於文本驅動的影像和影片擴散模型已經取得了前所未有的成功，能夠生成逼真且多樣化的內容。最近，基於擴散的生成模型中對現有影像和影片進行編輯和變化引起了顯著的關注。然而，先前的研究僅限於使用文本編輯內容或使用單一視覺線索提供粗略個性化，因此無法滿足需要精細和詳細控制的難以描述的內容。在這方面，我們提出了一個名為Make-A-Protagonist的通用影片編輯框架，該框架利用文本和視覺線索來編輯影片，目的是讓個人成為主角。具體來說，我們利用多個專家來解析源影片、目標視覺和文本線索，並提出了一個基於視覺和文本的影片生成模型，該模型利用遮罩引導去噪採樣來生成所需的輸出。廣泛的結果展示了Make-A-Protagonist的多才多藝和卓越的編輯能力。

English

The text-driven image and video diffusion models have achieved unprecedented success in generating realistic and diverse content. Recently, the editing and variation of existing images and videos in diffusion-based generative models have garnered significant attention. However, previous works are limited to editing content with text or providing coarse personalization using a single visual clue, rendering them unsuitable for indescribable content that requires fine-grained and detailed control. In this regard, we propose a generic video editing framework called Make-A-Protagonist, which utilizes textual and visual clues to edit videos with the goal of empowering individuals to become the protagonists. Specifically, we leverage multiple experts to parse source video, target visual and textual clues, and propose a visual-textual-based video generation model that employs mask-guided denoising sampling to generate the desired output. Extensive results demonstrate the versatile and remarkable editing capabilities of Make-A-Protagonist.

Make-A-Protagonist：使用專家群進行通用視頻編輯

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

摘要

Support