SeedVR2: 확산 적대적 사후 학습을 통한 원스텝 비디오 복원

초록

최근 확산 기반 비디오 복원(VR) 기술의 발전은 시각적 품질에서 상당한 개선을 보여주었지만, 추론 과정에서 과도한 계산 비용을 초래하고 있습니다. 한편, 여러 증류 기반 접근법이 단일 단계 이미지 복원의 잠재력을 입증했음에도 불구하고, 이를 VR로 확장하는 것은 여전히 어려운 과제로 남아 있으며, 특히 실제 환경에서 고해상도 비디오를 다룰 때 더욱 그렇습니다. 본 연구에서는 SeedVR2로 명명된 단일 단계 확산 기반 VR 모델을 제안하며, 이 모델은 실제 데이터에 대해 적대적 VR 훈련을 수행합니다. 단일 단계 내에서 고해상도 VR을 처리하기 위해, 우리는 모델 아키텍처와 훈련 절차 모두에 여러 가지 개선 사항을 도입했습니다. 구체적으로, 출력 해상도에 맞춰 창 크기를 동적으로 조정하는 적응형 창 주의 메커니즘을 제안하여, 미리 정의된 창 크기를 사용한 창 주의 메커니즘에서 관찰된 고해상도 VR 하의 창 불일치 문제를 해결했습니다. 또한, VR을 위한 적대적 사후 훈련을 안정화하고 개선하기 위해, 훈련 효율성을 크게 희생하지 않으면서 제안된 특징 매칭 손실을 포함한 일련의 손실 함수의 효과를 검증했습니다. 광범위한 실험을 통해 SeedVR2가 단일 단계에서 기존 VR 접근법과 비교하여 비슷하거나 더 나은 성능을 달성할 수 있음을 입증했습니다.

English

Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.

SeedVR2: 확산 적대적 사후 학습을 통한 원스텝 비디오 복원

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

초록

Support