Marigold:基於擴散模型的圖像生成器在圖像分析中的經濟高效適應
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
May 14, 2025
作者: Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler
cs.AI
摘要
過去十年間,深度學習在電腦視覺領域的成功,主要依賴於大規模標註數據集和強大的預訓練模型。在數據稀缺的環境中,這些預訓練模型的品質對於有效的遷移學習至關重要。傳統上,圖像分類和自監督學習一直是預訓練卷積神經網絡(CNN)和基於Transformer架構的主要方法。最近,文本到圖像生成模型的興起,尤其是那些在潛在空間中使用去噪擴散技術的模型,引入了一類新的基礎模型,這些模型在大量帶有標題的圖像數據集上進行訓練。這些模型能夠生成未見過內容的真實圖像,表明它們對視覺世界具有深刻的理解。在本研究中,我們提出了Marigold,這是一系列條件生成模型及微調協議,旨在從如Stable Diffusion等預訓練的潛在擴散模型中提取知識,並將其適應於密集圖像分析任務,包括單目深度估計、表面法線預測和本質分解。Marigold僅需對預訓練潛在擴散模型的架構進行最小程度的修改,使用小型合成數據集在單個GPU上訓練數天,並展示了最先進的零樣本泛化能力。項目頁面:https://marigoldcomputervision.github.io
English
The success of deep learning in computer vision over the past decade has
hinged on large labeled datasets and strong pretrained models. In data-scarce
settings, the quality of these pretrained models becomes crucial for effective
transfer learning. Image classification and self-supervised learning have
traditionally been the primary methods for pretraining CNNs and
transformer-based architectures. Recently, the rise of text-to-image generative
models, particularly those using denoising diffusion in a latent space, has
introduced a new class of foundational models trained on massive, captioned
image datasets. These models' ability to generate realistic images of unseen
content suggests they possess a deep understanding of the visual world. In this
work, we present Marigold, a family of conditional generative models and a
fine-tuning protocol that extracts the knowledge from pretrained latent
diffusion models like Stable Diffusion and adapts them for dense image analysis
tasks, including monocular depth estimation, surface normals prediction, and
intrinsic decomposition. Marigold requires minimal modification of the
pre-trained latent diffusion model's architecture, trains with small synthetic
datasets on a single GPU over a few days, and demonstrates state-of-the-art
zero-shot generalization. Project page:
https://marigoldcomputervision.github.ioSummary
AI-Generated Summary