Depth Pro: 1秒未満での鮮明な単眼メトリック深度

要旨

ゼロショットメトリック単眼奥行き推定のための基礎モデルを提案します。当社のモデル、Depth Proは、類を見ない鮮明さと高周波数の詳細を持つ高解像度の奥行きマップを合成します。予測はメトリックであり、絶対スケールであり、カメラ固有のメタデータの利用を必要としません。また、このモデルは高速であり、標準的なGPU上で0.3秒で225万画素の奥行きマップを生成します。これらの特性は、密な予測のための効率的なマルチスケールビジョントランスフォーマー、高いメトリック精度と細かい境界トレースを実現するために実際のデータセットと合成データセットを組み合わせてトレーニングするプロトコル、推定された奥行きマップの境界精度のための専用評価メトリック、単一画像からの最先端の焦点距離推定など、いくつかの技術的貢献によって実現されています。詳細な実験により、特定の設計選択肢を分析し、Depth Proが複数の側面で従来の研究を凌駕することを示しています。コードと重みは、https://github.com/apple/ml-depth-pro で公開されています。

English

We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image. Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. We release code and weights at https://github.com/apple/ml-depth-pro

Depth Pro: 1秒未満での鮮明な単眼メトリック深度

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

要旨

Support