DiffuEraser:一種用於視頻修補的擴散模型
DiffuEraser: A Diffusion Model for Video Inpainting
January 17, 2025
作者: Xiaowen Li, Haolan Xue, Peiran Ren, Liefeng Bo
cs.AI
摘要
最近的視頻修補算法將基於流的像素傳播與基於Transformer的生成相結合,以利用光流來恢復紋理和對象,並利用來自相鄰幀的信息完成遮罩區域,同時通過視覺Transformer來完成。然而,這些方法在處理大遮罩時常常遇到模糊和時間不一致的問題,凸顯了需要具有增強生成能力的模型。最近,擴散模型作為圖像和視頻生成中的一項突出技術崛起,其表現令人印象深刻。在本文中,我們介紹了DiffuEraser,一種基於穩定擴散的視頻修補模型,旨在以更多細節和更一致的結構填補遮罩區域。我們結合先前信息提供初始化和弱條件,有助於減輕噪音異常並抑制幻覺。此外,為了在長序列推斷期間改善時間一致性,我們擴展了先前模型和DiffuEraser的時間感知域,並通過利用視頻擴散模型的時間平滑特性進一步增強一致性。實驗結果表明,我們提出的方法在內容完整性和時間一致性方面優於最先進的技術,同時保持可接受的效率。
English
Recent video inpainting algorithms integrate flow-based pixel propagation
with transformer-based generation to leverage optical flow for restoring
textures and objects using information from neighboring frames, while
completing masked regions through visual Transformers. However, these
approaches often encounter blurring and temporal inconsistencies when dealing
with large masks, highlighting the need for models with enhanced generative
capabilities. Recently, diffusion models have emerged as a prominent technique
in image and video generation due to their impressive performance. In this
paper, we introduce DiffuEraser, a video inpainting model based on stable
diffusion, designed to fill masked regions with greater details and more
coherent structures. We incorporate prior information to provide initialization
and weak conditioning,which helps mitigate noisy artifacts and suppress
hallucinations. Additionally, to improve temporal consistency during
long-sequence inference, we expand the temporal receptive fields of both the
prior model and DiffuEraser, and further enhance consistency by leveraging the
temporal smoothing property of Video Diffusion Models. Experimental results
demonstrate that our proposed method outperforms state-of-the-art techniques in
both content completeness and temporal consistency while maintaining acceptable
efficiency.Summary
AI-Generated Summary