SeedVR2：基於擴散對抗性後訓練的一步式視頻修復

摘要

基於擴散模型的視頻修復（VR）技術近期取得了顯著進展，在視覺質量上實現了大幅提升，然而在推理過程中卻產生了難以承受的計算成本。儘管多種基於蒸餾的方法已展現出一步圖像修復的潛力，但將現有方法擴展至視頻修復仍面臨挑戰且研究不足，尤其是在處理現實場景中的高分辨率視頻時。在本研究中，我們提出了一種一步擴散基於的視頻修復模型，命名為SeedVR2，該模型針對真實數據進行對抗性視頻修復訓練。為應對單步處理高分辨率視頻修復的挑戰，我們在模型架構和訓練流程上引入了多項改進。具體而言，提出了一種自適應窗口注意力機制，其中窗口大小會根據輸出分辨率動態調整，避免了在高分辨率視頻修復中使用預設窗口大小時出現的窗口不一致問題。為穩定並提升對抗性後訓練在視頻修復中的效果，我們進一步驗證了一系列損失函數的有效性，包括提出的特徵匹配損失，且未顯著犧牲訓練效率。大量實驗表明，SeedVR2在單步操作中能夠達到與現有視頻修復方法相當甚至更優的性能。

English

Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.

SeedVR2：基於擴散對抗性後訓練的一步式視頻修復

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

摘要

Support