Make-A-Protagonist: 전문가 앙상블을 통한 일반적인 비디오 편집

초록

텍스트 기반 이미지 및 비디오 확산 모델은 현실적이고 다양한 콘텐츠 생성에서 전례 없는 성공을 거두었습니다. 최근에는 확산 기반 생성 모델을 사용한 기존 이미지와 비디오의 편집 및 변형이 큰 주목을 받고 있습니다. 그러나 기존 연구들은 텍스트를 통한 콘텐츠 편집이나 단일 시각적 단서를 사용한 대략적인 개인화에 국한되어 있어, 세밀하고 상세한 제어가 필요한 묘사하기 어려운 콘텐츠에는 적합하지 않았습니다. 이에 따라, 우리는 텍스트와 시각적 단서를 활용하여 개인이 주인공이 될 수 있도록 비디오를 편집하는 일반적인 비디오 편집 프레임워크인 Make-A-Protagonist를 제안합니다. 구체적으로, 우리는 여러 전문가를 활용하여 소스 비디오, 목표 시각적 및 텍스트 단서를 분석하고, 마스크 기반 노이즈 제거 샘플링을 사용하여 원하는 출력을 생성하는 시각-텍스트 기반 비디오 생성 모델을 제안합니다. 광범위한 실험 결과는 Make-A-Protagonist의 다재다능하고 뛰어난 편집 능력을 입증합니다.

English

The text-driven image and video diffusion models have achieved unprecedented success in generating realistic and diverse content. Recently, the editing and variation of existing images and videos in diffusion-based generative models have garnered significant attention. However, previous works are limited to editing content with text or providing coarse personalization using a single visual clue, rendering them unsuitable for indescribable content that requires fine-grained and detailed control. In this regard, we propose a generic video editing framework called Make-A-Protagonist, which utilizes textual and visual clues to edit videos with the goal of empowering individuals to become the protagonists. Specifically, we leverage multiple experts to parse source video, target visual and textual clues, and propose a visual-textual-based video generation model that employs mask-guided denoising sampling to generate the desired output. Extensive results demonstrate the versatile and remarkable editing capabilities of Make-A-Protagonist.

Make-A-Protagonist: 전문가 앙상블을 통한 일반적인 비디오 편집

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

초록

Support