NaRCan:自然純化規範影像與擴散整合先驗於影片編輯
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
June 10, 2024
作者: Ting-Hsuan Chen, Jiewen Chan, Hau-Shiang Shiu, Shih-Han Yen, Chang-Han Yeh, Yu-Lun Liu
cs.AI
摘要
我們提出了一個名為 NaRCan 的影片編輯框架,該框架整合了混合變形場和擴散先驗,以生成高質量的自然規範圖像,以代表輸入影片。我們的方法利用同構來模擬全局運動,並使用多層感知器(MLPs)來捕捉局部殘差變形,增強模型處理複雜影片動態的能力。通過在訓練的早期階段引入擴散先驗,我們的模型確保生成的圖像保留高質量的自然外觀,使生成的規範圖像適用於影片編輯中的各種下游任務,這是當前基於規範的方法所無法實現的。此外,我們融入了低秩適應(LoRA)微調,並引入了一種噪聲和擴散先驗更新排程技術,可將訓練過程加速 14 倍。廣泛的實驗結果顯示,我們的方法在各種影片編輯任務中優於現有方法,並產生連貫且高質量的編輯影片序列。請查看我們的項目頁面以獲取影片結果,網址為 https://koi953215.github.io/NaRCan_page/。
English
We propose a video editing framework, NaRCan, which integrates a hybrid
deformation field and diffusion prior to generate high-quality natural
canonical images to represent the input video. Our approach utilizes homography
to model global motion and employs multi-layer perceptrons (MLPs) to capture
local residual deformations, enhancing the model's ability to handle complex
video dynamics. By introducing a diffusion prior from the early stages of
training, our model ensures that the generated images retain a high-quality
natural appearance, making the produced canonical images suitable for various
downstream tasks in video editing, a capability not achieved by current
canonical-based methods. Furthermore, we incorporate low-rank adaptation (LoRA)
fine-tuning and introduce a noise and diffusion prior update scheduling
technique that accelerates the training process by 14 times. Extensive
experimental results show that our method outperforms existing approaches in
various video editing tasks and produces coherent and high-quality edited video
sequences. See our project page for video results at
https://koi953215.github.io/NaRCan_page/.Summary
AI-Generated Summary