NaRCan：自然细化规范图像与扩散整合先验用于视频编辑

摘要

我们提出了一个视频编辑框架 NaRCan，它整合了混合变形场和扩散先验，以生成高质量的自然规范图像来表示输入视频。我们的方法利用单应性来建模全局运动，并采用多层感知器（MLPs）来捕捉局部残差变形，增强模型处理复杂视频动态的能力。通过在训练的早期阶段引入扩散先验，我们的模型确保生成的图像保持高质量的自然外观，使生成的规范图像适用于视频编辑中的各种下游任务，这是当前基于规范的方法所无法实现的。此外，我们还融合了低秩适应（LoRA）微调，并引入了一种噪声和扩散先验更新调度技术，可以将训练过程加速 14 倍。大量实验结果表明，我们的方法在各种视频编辑任务中优于现有方法，并产生连贯且高质量的编辑视频序列。请查看我们的项目页面以获取视频结果：https://koi953215.github.io/NaRCan_page/。

English

We propose a video editing framework, NaRCan, which integrates a hybrid deformation field and diffusion prior to generate high-quality natural canonical images to represent the input video. Our approach utilizes homography to model global motion and employs multi-layer perceptrons (MLPs) to capture local residual deformations, enhancing the model's ability to handle complex video dynamics. By introducing a diffusion prior from the early stages of training, our model ensures that the generated images retain a high-quality natural appearance, making the produced canonical images suitable for various downstream tasks in video editing, a capability not achieved by current canonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA) fine-tuning and introduce a noise and diffusion prior update scheduling technique that accelerates the training process by 14 times. Extensive experimental results show that our method outperforms existing approaches in various video editing tasks and produces coherent and high-quality edited video sequences. See our project page for video results at https://koi953215.github.io/NaRCan_page/.

NaRCan：自然细化规范图像与扩散整合先验用于视频编辑

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

摘要

Support