SparkVSR：疎なキーフレーム伝搬による対話型ビデオ超解像

要旨

Video Super-Resolution（VSR）は、低解像度（LR）の映像から高品質なフレームを復元することを目的としている。しかし、既存のVSR手法の多くは推論時にブラックボックスのように振る舞い、ユーザは予期せぬアーティファクトを確実に修正することができず、モデルが出力する結果を受け入れるしかない。本論文では、疎なキーフレームを簡潔で表現力豊かな制御信号とする、SparkVSRと名付けた新しいインタラクティブなVSRフレームワークを提案する。具体的には、ユーザはまず任意の既存の画像超解像（ISR）モデルを用いて少数のキーフレームを超解像し（オプション）、その後SparkVSRがキーフレームの事前情報を映像シーケンス全体に伝播させる。この際、元のLR映像の動きを基盤として維持する。我々は、LR映像の潜在表現と疎に符号化された高解像度（HR）キーフレームの潜在表現を融合させ、頑健なクロス空間伝播を学習し知覚的詳細を精緻化する、キーフレーム条件付きの潜在‐ピクセル二段階訓練パイプラインを導入する。推論時、SparkVSRは柔軟なキーフレーム選択（手動指定、コーデックのIフレーム抽出、ランダムサンプリング）と、キーフレームへの忠実度とブラインド復元を継続的に調整する参照不要のガイダンス機構をサポートする。これにより、参照キーフレームが存在しない、または不完全な場合でも頑健な性能を保証する。複数のVSRベンチマークにおける実験により、時間的一貫性の向上と強力な復元品質が実証され、CLIP-IQA、DOVER、MUSIQにおいてそれぞれ最大24.6%、21.8%、5.6%ベースラインを上回り、制御可能なキーフレーム駆動のビデオ超解像を実現する。さらに、SparkVSRは旧フィルム修復やビデオスタイル転送といった未見タスクにもそのまま適用可能であり、汎用的なインタラクティブ・キーフレーム条件付き映像処理フレームワークであることを示す。プロジェクトページはhttps://sparkvsr.github.io/で公開されている。

English

Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a small set of keyframes using any off-the-shelf image super-resolution (ISR) model, then SparkVSR propagates the keyframe priors to the entire video sequence while remaining grounded by the original LR video motion. Concretely, we introduce a keyframe-conditioned latent-pixel two-stage training pipeline that fuses LR video latents with sparsely encoded HR keyframe latents to learn robust cross-space propagation and refine perceptual details. At inference time, SparkVSR supports flexible keyframe selection (manual specification, codec I-frame extraction, or random sampling) and a reference-free guidance mechanism that continuously balances keyframe adherence and blind restoration, ensuring robust performance even when reference keyframes are absent or imperfect. Experiments on multiple VSR benchmarks demonstrate improved temporal consistency and strong restoration quality, surpassing baselines by up to 24.6%, 21.8%, and 5.6% on CLIP-IQA, DOVER, and MUSIQ, respectively, enabling controllable, keyframe-driven video super-resolution. Moreover, we demonstrate that SparkVSR is a generic interactive, keyframe-conditioned video processing framework as it can be applied out of the box to unseen tasks such as old-film restoration and video style transfer. Our project page is available at: https://sparkvsr.github.io/

SparkVSR：疎なキーフレーム伝搬による対話型ビデオ超解像

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

要旨

Support