ChatPaper.aiChatPaper

基於任意先驗的深度萬物

Depth Anything with Any Prior

May 15, 2025
作者: Zehan Wang, Siyu Chen, Lihe Yang, Jialei Wang, Ziang Zhang, Hengshuang Zhao, Zhou Zhao
cs.AI

摘要

本研究提出了Prior Depth Anything框架,該框架將深度測量中不完整但精確的度量信息與深度預測中相對但完整的幾何結構相結合,為任何場景生成準確、密集且細緻的度量深度圖。為此,我們設計了一個由粗到精的流程,逐步整合這兩種互補的深度來源。首先,我們引入了像素級度量對齊和距離感知加權,通過顯式使用深度預測來預填充多樣的度量先驗。這有效縮小了先驗模式之間的領域差距,增強了在不同場景下的泛化能力。其次,我們開發了一個條件化的單目深度估計(MDE)模型,以精煉深度先驗中的固有噪聲。通過對標準化的預填充先驗和預測進行條件化,該模型進一步隱式地融合了這兩種互補的深度來源。我們的模型在7個真實世界數據集上展示了令人印象深刻的零樣本泛化能力,涵蓋深度補全、超分辨率和修復任務,與甚至超越了之前的特定任務方法。更重要的是,它在具有挑戰性的、未見的混合先驗上表現良好,並通過切換預測模型實現了測試時的改進,提供了靈活的精度-效率權衡,同時隨著MDE模型的進步而不斷演進。
English
This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene. To this end, we design a coarse-to-fine pipeline to progressively integrate the two complementary depth sources. First, we introduce pixel-level metric alignment and distance-aware weighting to pre-fill diverse metric priors by explicitly using depth prediction. It effectively narrows the domain gap between prior patterns, enhancing generalization across varying scenarios. Second, we develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors. By conditioning on the normalized pre-filled prior and prediction, the model further implicitly merges the two complementary depth sources. Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets, matching or even surpassing previous task-specific methods. More importantly, it performs well on challenging, unseen mixed priors and enables test-time improvements by switching prediction models, providing a flexible accuracy-efficiency trade-off while evolving with advancements in MDE models.

Summary

AI-Generated Summary

PDF92May 16, 2025