ChatPaper.aiChatPaper

具有視野條件擴散模型的零樣本度量深度

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

December 20, 2023
作者: Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
cs.AI

摘要

儘管單眼深度估計方法在標準基準測試上取得了顯著進展,但零樣本度量深度估計仍未解決。挑戰包括室內和室外場景的聯合建模,這些場景通常展現出顯著不同的RGB和深度分佈,以及由於未知相機內部參數而導致的深度尺度模糊。最近的研究提出了專門的多頭架構,用於聯合建模室內和室外場景。相比之下,我們主張一種通用的、任務不可知的擴散模型,具有多項先進技術,例如對數尺度深度參數化,以實現室內和室外場景的聯合建模,並條件化視野(FOV)以處理尺度模糊,並在訓練期間對FOV進行合成增強,以實現超越訓練數據集中有限相機內部參數的泛化。此外,通過使用比通常更多樣化的訓練混合物和高效的擴散參數化,我們的方法DMD(度量深度擴散)在零樣本室內減少了25%的相對誤差(REL),在零樣本室外數據集上減少了33%,僅使用少量去噪步驟即可超越當前的SOTA。有關概述,請參見https://diffusion-vision.github.io/dmd
English
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd
PDF284December 15, 2024