Marigold:面向图像分析的扩散式生成器经济高效适配方案
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
May 14, 2025
作者: Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler
cs.AI
摘要
过去十年间,深度学习在计算机视觉领域的成功,很大程度上依赖于大规模标注数据集和强大的预训练模型。在数据稀缺的场景下,这些预训练模型的质量对于有效的迁移学习至关重要。传统上,图像分类和自监督学习是预训练卷积神经网络(CNN)及基于Transformer架构的主要方法。近期,文本到图像生成模型的兴起,尤其是那些在潜在空间中使用去噪扩散技术的模型,引入了一类基于海量带标注图像数据集训练的基础模型。这些模型能够生成未见内容的逼真图像,表明它们对视觉世界有着深刻的理解。在本研究中,我们提出了Marigold,这是一系列条件生成模型及微调协议,旨在从如Stable Diffusion这样的预训练潜在扩散模型中提取知识,并将其适配于密集图像分析任务,包括单目深度估计、表面法线预测和本征分解。Marigold对预训练潜在扩散模型的架构改动极小,仅需在单个GPU上使用小型合成数据集训练数日,便展示了最先进的零样本泛化能力。项目页面:https://marigoldcomputervision.github.io
English
The success of deep learning in computer vision over the past decade has
hinged on large labeled datasets and strong pretrained models. In data-scarce
settings, the quality of these pretrained models becomes crucial for effective
transfer learning. Image classification and self-supervised learning have
traditionally been the primary methods for pretraining CNNs and
transformer-based architectures. Recently, the rise of text-to-image generative
models, particularly those using denoising diffusion in a latent space, has
introduced a new class of foundational models trained on massive, captioned
image datasets. These models' ability to generate realistic images of unseen
content suggests they possess a deep understanding of the visual world. In this
work, we present Marigold, a family of conditional generative models and a
fine-tuning protocol that extracts the knowledge from pretrained latent
diffusion models like Stable Diffusion and adapts them for dense image analysis
tasks, including monocular depth estimation, surface normals prediction, and
intrinsic decomposition. Marigold requires minimal modification of the
pre-trained latent diffusion model's architecture, trains with small synthetic
datasets on a single GPU over a few days, and demonstrates state-of-the-art
zero-shot generalization. Project page:
https://marigoldcomputervision.github.io