ProPainter：改进传播和变换器以提高视频修复

摘要

基于流的传播和时空Transformer是视频修复（VI）中两种主流机制。尽管这些组件有效，但仍存在一些限制影响其性能。先前基于传播的方法在图像或特征域中分别执行。孤立于学习的全局图像传播可能由于不准确的光流导致空间错位。此外，内存或计算约束限制了特征传播和视频Transformer的时间范围，阻止了对来自远处帧的对应信息的探索。为解决这些问题，我们提出了一个改进的框架，称为ProPainter，其中包括增强的ProPagation和高效的Transformer。具体来说，我们引入了结合图像和特征变形优势的双域传播，可可靠地利用全局对应关系。我们还提出了一个基于掩模引导的稀疏视频Transformer，通过丢弃不必要和多余的标记，实现了高效率。凭借这些组件，ProPainter在保持吸引人的效率的同时，PSNR高出先前方法1.46 dB。

English

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.

ProPainter：改进传播和变换器以提高视频修复

ProPainter: Improving Propagation and Transformer for Video Inpainting

摘要

Support