V-Bridge：将视频生成先验迁移至通用少样本图像复原的桥梁

摘要

大规模视频生成模型通过海量多样化视觉数据训练，能够内化视觉世界中丰富的结构、语义与动态先验知识。尽管这些模型已展现出卓越的生成能力，但其作为通用视觉学习器的潜力尚未被充分挖掘。本研究提出V-Bridge框架，将这种潜在能力桥接到多样化的少样本图像复原任务中。我们重新定义图像复原不是静态回归问题，而是渐进式生成过程，利用视频模型模拟从退化输入到高保真输出的逐步优化。令人惊讶的是，仅需1,000个多任务训练样本（不足现有复原方法的2%），即可引导预训练视频模型实现具有竞争力的图像复原效果——单一模型即可完成多项任务，其性能可与专门设计的架构相媲美。我们的研究揭示：视频生成模型隐式学习了强大且可迁移的复原先验，仅需极少数据即可激活，这挑战了生成建模与底层视觉间的传统界限，为视觉任务基础模型开辟了新的设计范式。

English

Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.

V-Bridge：将视频生成先验迁移至通用少样本图像复原的桥梁

V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

摘要

Support