I2VEdit: 이미지-비디오 확산 모델을 통한 첫 프레임 기반 비디오 편집

초록

확산 모델의 놀라운 생성 능력은 이미지 및 비디오 편집 분야에서 광범위한 연구를 촉진해 왔습니다. 시간 차원에서 추가적인 도전에 직면한 비디오 편집과 비교할 때, 이미지 편집은 더 다양하고 고품질의 접근 방식과 Photoshop과 같은 더 강력한 소프트웨어의 발전을 목격했습니다. 이러한 격차를 고려하여, 우리는 사전 훈련된 이미지-투-비디오 모델을 사용하여 단일 프레임에서 전체 비디오로 편집을 전파함으로써 이미지 편집 도구의 적용 범위를 비디오로 확장하는 새롭고 일반적인 솔루션을 소개합니다. 우리의 방법인 I2VEdit은 편집의 정도에 따라 소스 비디오의 시각적 및 모션 무결성을 적응적으로 보존하며, 기존 방법들이 완전히 달성하지 못한 전역 편집, 지역 편집, 그리고 중간 정도의 형태 변화를 효과적으로 처리합니다. 우리 방법의 핵심은 두 가지 주요 프로세스로 구성됩니다: 원본 비디오와 기본 모션 패턴을 정렬하기 위한 Coarse Motion Extraction과 세밀한 주의 매칭을 사용한 정밀 조정을 위한 Appearance Refinement입니다. 또한, 우리는 여러 비디오 클립에 걸친 자동 회귀 생성으로 인한 품질 저하를 완화하기 위해 skip-interval 전략을 통합했습니다. 실험 결과는 우리의 프레임워크가 세밀한 비디오 편집에서 우수한 성능을 보이며, 고품질의 시간적 일관성을 가진 출력을 생성할 수 있는 능력을 입증합니다.

English

The remarkable generative capabilities of diffusion models have motivated extensive research in both image and video editing. Compared to video editing which faces additional challenges in the time dimension, image editing has witnessed the development of more diverse, high-quality approaches and more capable software like Photoshop. In light of this gap, we introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model. Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits, effectively handling global edits, local edits, and moderate shape changes, which existing methods cannot fully achieve. At the core of our method are two main processes: Coarse Motion Extraction to align basic motion patterns with the original video, and Appearance Refinement for precise adjustments using fine-grained attention matching. We also incorporate a skip-interval strategy to mitigate quality degradation from auto-regressive generation across multiple video clips. Experimental results demonstrate our framework's superior performance in fine-grained video editing, proving its capability to produce high-quality, temporally consistent outputs.

I2VEdit: 이미지-비디오 확산 모델을 통한 첫 프레임 기반 비디오 편집

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

초록

Support