NaRCan: 비디오 편집을 위한 디퓨전 사전 지식 통합을 통한 자연스러운 정제된 정준 이미지

초록

우리는 입력 비디오를 표현하기 위해 고품질의 자연스러운 정규 이미지를 생성하기 위해 하이브리드 변형 필드와 확산 사전 정보를 통합한 비디오 편집 프레임워크인 NaRCan을 제안합니다. 우리의 접근 방식은 전역 모션을 모델링하기 위해 호모그래피를 사용하고, 지역 잔차 변형을 포착하기 위해 다층 퍼셉트론(MLP)을 활용함으로써 복잡한 비디오 동역학을 처리하는 모델의 능력을 향상시킵니다. 학습 초기 단계부터 확산 사전 정보를 도입함으로써, 우리의 모델은 생성된 이미지가 고품질의 자연스러운 외관을 유지하도록 보장하여, 현재의 정규 기반 방법들로는 달성할 수 없는 다양한 비디오 편집 하위 작업에 적합한 정규 이미지를 생성합니다. 또한, 우리는 저순위 적응(LoRA) 미세 조정을 통합하고, 노이즈 및 확산 사전 정보 업데이트 스케줄링 기법을 도입하여 학습 과정을 14배 가속화합니다. 광범위한 실험 결과는 우리의 방법이 다양한 비디오 편집 작업에서 기존 접근법들을 능가하며, 일관성 있고 고품질의 편집된 비디오 시퀀스를 생성함을 보여줍니다. 비디오 결과는 프로젝트 페이지(https://koi953215.github.io/NaRCan_page/)에서 확인할 수 있습니다.

English

We propose a video editing framework, NaRCan, which integrates a hybrid deformation field and diffusion prior to generate high-quality natural canonical images to represent the input video. Our approach utilizes homography to model global motion and employs multi-layer perceptrons (MLPs) to capture local residual deformations, enhancing the model's ability to handle complex video dynamics. By introducing a diffusion prior from the early stages of training, our model ensures that the generated images retain a high-quality natural appearance, making the produced canonical images suitable for various downstream tasks in video editing, a capability not achieved by current canonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA) fine-tuning and introduce a noise and diffusion prior update scheduling technique that accelerates the training process by 14 times. Extensive experimental results show that our method outperforms existing approaches in various video editing tasks and produces coherent and high-quality edited video sequences. See our project page for video results at https://koi953215.github.io/NaRCan_page/.

NaRCan: 비디오 편집을 위한 디퓨전 사전 지식 통합을 통한 자연스러운 정제된 정준 이미지

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

초록

Support