V-Bridge：将视频生成先验桥接至通用少样本图像复原

摘要

大規模影片生成模型經過海量多樣化視覺數據的訓練，使其能夠內化視覺世界中豐富的結構、語義與動態先驗知識。儘管這類模型已展現出令人印象深刻的生成能力，但其作為通用視覺學習器的潛力仍未被充分發掘。本研究提出V-Bridge框架，將這種潛在能力橋接至多樣化的少樣本圖像修復任務。我們重新定義圖像修復不是靜態的回歸問題，而是漸進式生成過程，並利用影片模型模擬從退化輸入到高保真輸出的逐步優化過程。令人驚奇的是，僅需1,000個多任務訓練樣本（不足現有修復方法的2%），即可引導預訓練影片模型實現具競爭力的圖像修復效果——單一模型能完成多項任務，其性能可與專為此目的設計的專用架構相媲美。我們的研究揭示：影片生成模型隱式學習了強大且可遷移的修復先驗，僅需極少量數據即可激活。這挑戰了生成建模與低層級視覺處理的傳統界限，為視覺任務基礎模型開闢了新的設計範式。

English

Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.

V-Bridge：将视频生成先验桥接至通用少样本图像复原

V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

摘要

Support