NaRCan:自然细化规范图像与扩散整合先验用于视频编辑
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
June 10, 2024
作者: Ting-Hsuan Chen, Jiewen Chan, Hau-Shiang Shiu, Shih-Han Yen, Chang-Han Yeh, Yu-Lun Liu
cs.AI
摘要
我们提出了一个视频编辑框架 NaRCan,它整合了混合变形场和扩散先验,以生成高质量的自然规范图像来表示输入视频。我们的方法利用单应性来建模全局运动,并采用多层感知器(MLPs)来捕捉局部残差变形,增强模型处理复杂视频动态的能力。通过在训练的早期阶段引入扩散先验,我们的模型确保生成的图像保持高质量的自然外观,使生成的规范图像适用于视频编辑中的各种下游任务,这是当前基于规范的方法所无法实现的。此外,我们还融合了低秩适应(LoRA)微调,并引入了一种噪声和扩散先验更新调度技术,可以将训练过程加速 14 倍。大量实验结果表明,我们的方法在各种视频编辑任务中优于现有方法,并产生连贯且高质量的编辑视频序列。请查看我们的项目页面以获取视频结果:https://koi953215.github.io/NaRCan_page/。
English
We propose a video editing framework, NaRCan, which integrates a hybrid
deformation field and diffusion prior to generate high-quality natural
canonical images to represent the input video. Our approach utilizes homography
to model global motion and employs multi-layer perceptrons (MLPs) to capture
local residual deformations, enhancing the model's ability to handle complex
video dynamics. By introducing a diffusion prior from the early stages of
training, our model ensures that the generated images retain a high-quality
natural appearance, making the produced canonical images suitable for various
downstream tasks in video editing, a capability not achieved by current
canonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA)
fine-tuning and introduce a noise and diffusion prior update scheduling
technique that accelerates the training process by 14 times. Extensive
experimental results show that our method outperforms existing approaches in
various video editing tasks and produces coherent and high-quality edited video
sequences. See our project page for video results at
https://koi953215.github.io/NaRCan_page/.Summary
AI-Generated Summary