NaRCan：自然純化規範影像與擴散整合先驗於影片編輯

摘要

我們提出了一個名為 NaRCan 的影片編輯框架，該框架整合了混合變形場和擴散先驗，以生成高質量的自然規範圖像，以代表輸入影片。我們的方法利用同構來模擬全局運動，並使用多層感知器（MLPs）來捕捉局部殘差變形，增強模型處理複雜影片動態的能力。通過在訓練的早期階段引入擴散先驗，我們的模型確保生成的圖像保留高質量的自然外觀，使生成的規範圖像適用於影片編輯中的各種下游任務，這是當前基於規範的方法所無法實現的。此外，我們融入了低秩適應（LoRA）微調，並引入了一種噪聲和擴散先驗更新排程技術，可將訓練過程加速 14 倍。廣泛的實驗結果顯示，我們的方法在各種影片編輯任務中優於現有方法，並產生連貫且高質量的編輯影片序列。請查看我們的項目頁面以獲取影片結果，網址為 https://koi953215.github.io/NaRCan_page/。

English

We propose a video editing framework, NaRCan, which integrates a hybrid deformation field and diffusion prior to generate high-quality natural canonical images to represent the input video. Our approach utilizes homography to model global motion and employs multi-layer perceptrons (MLPs) to capture local residual deformations, enhancing the model's ability to handle complex video dynamics. By introducing a diffusion prior from the early stages of training, our model ensures that the generated images retain a high-quality natural appearance, making the produced canonical images suitable for various downstream tasks in video editing, a capability not achieved by current canonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA) fine-tuning and introduce a noise and diffusion prior update scheduling technique that accelerates the training process by 14 times. Extensive experimental results show that our method outperforms existing approaches in various video editing tasks and produces coherent and high-quality edited video sequences. See our project page for video results at https://koi953215.github.io/NaRCan_page/.

NaRCan：自然純化規範影像與擴散整合先驗於影片編輯

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

摘要

Support