使用视场条件扩散模型实现零样本度量深度

摘要

尽管单目深度估计方法在标准基准测试中取得了显著进展，但零样本度量深度估计仍未解决。挑战包括联合建模室内和室外场景，这些场景通常展现出明显不同的RGB和深度分布，以及由于未知摄像机内参引起的深度尺度歧义。最近的研究提出了专门的多头架构，用于联合建模室内和室外场景。相比之下，我们提倡一种通用的、与任务无关的扩散模型，具有多项先进技术，如对数尺度深度参数化，以实现室内和室外场景的联合建模，以视场（FOV）为条件处理尺度歧义，并在训练过程中通过合成增加FOV以实现对训练数据集中有限摄像机内参的泛化。此外，通过采用比通常更多样化的训练混合物和高效的扩散参数化，我们的方法，DMD（度量深度扩散），在零样本室内数据集上相对误差（REL）减少了25％，在零样本室外数据集上减少了33％，仅使用少量去噪步骤。有关概述，请参阅https://diffusion-vision.github.io/dmd

English

While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd

使用视场条件扩散模型实现零样本度量深度

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

摘要

Support