ProPainter：改進傳播和變壓器以用於影片修補

摘要

基於流的傳播和時空Transformer是視頻修補（VI）中兩種主流機制。儘管這些組件有效，但仍存在一些限制影響其性能。先前基於傳播的方法在圖像或特徵領域中分別執行。與學習分開的全局圖像傳播可能由於不準確的光流而導致空間不對齊。此外，記憶或計算限制限制了特徵傳播和視頻Transformer的時間範圍，阻礙了從遠處幀中探索對應信息。為解決這些問題，我們提出了一個改進的框架，稱為ProPainter，其中包括增強的ProPagation和高效的Transformer。具體而言，我們引入了雙域傳播，結合了圖像和特徵變形的優勢，可可靠地利用全局對應。我們還提出了一種掩碼引導的稀疏視頻Transformer，通過丟棄不必要和多餘的標記，實現了高效率。憑藉這些組件，ProPainter在保持吸引力效率的同時，PSNR方面的性能優於以往的藝術作品1.46 dB。

English

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.

ProPainter：改進傳播和變壓器以用於影片修補

ProPainter: Improving Propagation and Transformer for Video Inpainting

摘要

Support