SeedVR2：基于扩散对抗性后训练的一步式视频修复

摘要

近期基于扩散模型的视频修复（VR）技术取得了显著进展，在视觉质量上实现了大幅提升，但在推理过程中却带来了难以承受的计算成本。尽管已有多种基于蒸馏的方法展示了一步式图像修复的潜力，但将这些方法扩展到视频修复领域仍面临挑战且研究不足，尤其是在处理现实场景中的高分辨率视频时。本研究中，我们提出了一种名为SeedVR2的一步式扩散视频修复模型，该模型通过对抗训练针对真实数据进行视频修复。为了在单步内应对高分辨率视频修复的挑战，我们在模型架构和训练流程中引入了多项改进。具体而言，我们提出了一种自适应窗口注意力机制，其中窗口大小会根据输出分辨率动态调整，从而避免了使用预设窗口大小时在高分辨率视频修复中出现的窗口不一致问题。为了稳定并提升对抗训练在视频修复中的效果，我们进一步验证了一系列损失函数的有效性，包括提出的特征匹配损失，且未显著牺牲训练效率。大量实验表明，SeedVR2在单步操作中能够达到与现有视频修复方法相当甚至更优的性能。

English

Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.

SeedVR2：基于扩散对抗性后训练的一步式视频修复

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

摘要

Support