基于任意先验的深度预测
Depth Anything with Any Prior
May 15, 2025
作者: Zehan Wang, Siyu Chen, Lihe Yang, Jialei Wang, Ziang Zhang, Hengshuang Zhao, Zhou Zhao
cs.AI
摘要
本研究提出了Prior Depth Anything框架,该框架将深度测量中不完整但精确的度量信息与深度预测中相对但完整的几何结构相结合,为任意场景生成准确、密集且细致的度量深度图。为此,我们设计了一个由粗到精的流程,逐步整合这两种互补的深度来源。首先,我们引入了像素级度量对齐和距离感知加权,通过显式利用深度预测来预先填充多样化的度量先验。这有效缩小了先验模式之间的领域差距,增强了跨不同场景的泛化能力。其次,我们开发了一个条件化的单目深度估计(MDE)模型,以细化深度先验中的固有噪声。通过以归一化的预填充先验和预测为条件,该模型进一步隐式地融合了这两种互补的深度来源。我们的模型在7个真实世界数据集上的深度补全、超分辨率和修复任务中展示了令人印象深刻的零样本泛化能力,匹配甚至超越了之前的任务特定方法。更重要的是,它在具有挑战性的、未见过的混合先验上表现良好,并通过切换预测模型实现了测试时的改进,在MDE模型不断进步的同时,提供了灵活的精度-效率权衡。
English
This work presents Prior Depth Anything, a framework that combines incomplete
but precise metric information in depth measurement with relative but complete
geometric structures in depth prediction, generating accurate, dense, and
detailed metric depth maps for any scene. To this end, we design a
coarse-to-fine pipeline to progressively integrate the two complementary depth
sources. First, we introduce pixel-level metric alignment and distance-aware
weighting to pre-fill diverse metric priors by explicitly using depth
prediction. It effectively narrows the domain gap between prior patterns,
enhancing generalization across varying scenarios. Second, we develop a
conditioned monocular depth estimation (MDE) model to refine the inherent noise
of depth priors. By conditioning on the normalized pre-filled prior and
prediction, the model further implicitly merges the two complementary depth
sources. Our model showcases impressive zero-shot generalization across depth
completion, super-resolution, and inpainting over 7 real-world datasets,
matching or even surpassing previous task-specific methods. More importantly,
it performs well on challenging, unseen mixed priors and enables test-time
improvements by switching prediction models, providing a flexible
accuracy-efficiency trade-off while evolving with advancements in MDE models.Summary
AI-Generated Summary