ChatPaper.aiChatPaper

AnyDepth:轻松实现深度估计

AnyDepth: Depth Estimation Made Easy

January 6, 2026
作者: Zeyu Ren, Zeyu Zhang, Wukai Li, Qingxiang Liu, Hao Tang
cs.AI

摘要

單目深度估計旨在從二維圖像恢復三維場景的深度信息。儘管近期研究取得顯著進展,但其對大規模數據集和複雜解碼器的依賴限制了效率與泛化能力。本文提出一種面向零樣本單目深度估計的輕量級數據中心化框架。我們首先採用DINOv3作為視覺編碼器以獲取高質量稠密特徵;其次針對DPT複雜結構的固有缺陷,設計了基於緊湊型Transformer的解碼器SDT。相較於DPT,該解碼器通過單路徑特徵融合與上採樣過程,降低跨尺度特徵融合的計算開銷,在減少85%-89%參數量的同時實現更高精度。此外,我們提出基於質量的篩選策略以過濾有害樣本,從而在縮減數據集規模的同時提升整體訓練質量。在五個基準數據集上的大量實驗表明,本框架在精度上超越DPT。本研究彰顯了平衡模型設計與數據質量對於實現高效可泛化零樣本深度估計的重要性。代碼與項目網站詳見正文鏈接。
English
Monocular depth estimation aims to recover the depth information of 3D scenes from 2D images. Recent work has made significant progress, but its reliance on large-scale datasets and complex decoders has limited its efficiency and generalization ability. In this paper, we propose a lightweight and data-centric framework for zero-shot monocular depth estimation. We first adopt DINOv3 as the visual encoder to obtain high-quality dense features. Secondly, to address the inherent drawbacks of the complex structure of the DPT, we design the Simple Depth Transformer (SDT), a compact transformer-based decoder. Compared to the DPT, it uses a single-path feature fusion and upsampling process to reduce the computational overhead of cross-scale feature fusion, achieving higher accuracy while reducing the number of parameters by approximately 85%-89%. Furthermore, we propose a quality-based filtering strategy to filter out harmful samples, thereby reducing dataset size while improving overall training quality. Extensive experiments on five benchmarks demonstrate that our framework surpasses the DPT in accuracy. This work highlights the importance of balancing model design and data quality for achieving efficient and generalizable zero-shot depth estimation. Code: https://github.com/AIGeeksGroup/AnyDepth. Website: https://aigeeksgroup.github.io/AnyDepth.
PDF41January 13, 2026