深度Pro:不到一秒的快速单目度量深度
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
October 2, 2024
作者: Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun
cs.AI
摘要
我们提出了一种用于零样本度量单目深度估计的基础模型。我们的模型Depth Pro能够合成具有无与伦比的清晰度和高频细节的高分辨率深度图。预测是度量的,具有绝对尺度,无需依赖诸如相机内参等元数据的可用性。该模型速度快,在标准GPU上能够在0.3秒内生成一张225万像素的深度图。这些特性得益于多项技术贡献,包括用于密集预测的高效多尺度视觉Transformer,结合真实和合成数据集的训练协议以实现高度度量准确性和精细边界跟踪,专门用于评估估计深度图中边界准确性的评估指标,以及来自单张图像的最先进焦距估计。广泛的实验分析了特定设计选择,并表明Depth Pro在多个方面优于先前的工作。我们在https://github.com/apple/ml-depth-pro发布了代码和权重。
English
We present a foundation model for zero-shot metric monocular depth
estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with
unparalleled sharpness and high-frequency details. The predictions are metric,
with absolute scale, without relying on the availability of metadata such as
camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map
in 0.3 seconds on a standard GPU. These characteristics are enabled by a number
of technical contributions, including an efficient multi-scale vision
transformer for dense prediction, a training protocol that combines real and
synthetic datasets to achieve high metric accuracy alongside fine boundary
tracing, dedicated evaluation metrics for boundary accuracy in estimated depth
maps, and state-of-the-art focal length estimation from a single image.
Extensive experiments analyze specific design choices and demonstrate that
Depth Pro outperforms prior work along multiple dimensions. We release code and
weights at https://github.com/apple/ml-depth-proSummary
AI-Generated Summary