ChatPaper.aiChatPaper

使用视场条件扩散模型实现零样本度量深度

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

December 20, 2023
作者: Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
cs.AI

摘要

尽管单目深度估计方法在标准基准测试中取得了显著进展,但零样本度量深度估计仍未解决。挑战包括联合建模室内和室外场景,这些场景通常展现出明显不同的RGB和深度分布,以及由于未知摄像机内参引起的深度尺度歧义。最近的研究提出了专门的多头架构,用于联合建模室内和室外场景。相比之下,我们提倡一种通用的、与任务无关的扩散模型,具有多项先进技术,如对数尺度深度参数化,以实现室内和室外场景的联合建模,以视场(FOV)为条件处理尺度歧义,并在训练过程中通过合成增加FOV以实现对训练数据集中有限摄像机内参的泛化。此外,通过采用比通常更多样化的训练混合物和高效的扩散参数化,我们的方法,DMD(度量深度扩散),在零样本室内数据集上相对误差(REL)减少了25%,在零样本室外数据集上减少了33%,仅使用少量去噪步骤。有关概述,请参阅https://diffusion-vision.github.io/dmd
English
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd
PDF284December 15, 2024