具有視野條件擴散模型的零樣本度量深度

摘要

儘管單眼深度估計方法在標準基準測試上取得了顯著進展，但零樣本度量深度估計仍未解決。挑戰包括室內和室外場景的聯合建模，這些場景通常展現出顯著不同的RGB和深度分佈，以及由於未知相機內部參數而導致的深度尺度模糊。最近的研究提出了專門的多頭架構，用於聯合建模室內和室外場景。相比之下，我們主張一種通用的、任務不可知的擴散模型，具有多項先進技術，例如對數尺度深度參數化，以實現室內和室外場景的聯合建模，並條件化視野（FOV）以處理尺度模糊，並在訓練期間對FOV進行合成增強，以實現超越訓練數據集中有限相機內部參數的泛化。此外，通過使用比通常更多樣化的訓練混合物和高效的擴散參數化，我們的方法DMD（度量深度擴散）在零樣本室內減少了25％的相對誤差（REL），在零樣本室外數據集上減少了33％，僅使用少量去噪步驟即可超越當前的SOTA。有關概述，請參見https://diffusion-vision.github.io/dmd

English

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

具有視野條件擴散模型的零樣本度量深度

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

摘要

Support