深度任意 V2
Depth Anything V2
June 13, 2024
作者: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
cs.AI
摘要
本研究介紹了Depth Anything V2。我們不追求花俏的技巧,旨在揭示關鍵發現,為建立強大的單眼深度估計模型鋪平道路。值得注意的是,與V1相比,這個版本通過三個關鍵實踐產生了更精細和更穩健的深度預測:1)用合成圖像替換所有標記的真實圖像,2)擴大我們教師模型的容量,以及3)通過大規模虛擬標記的真實圖像的橋樑來教導學生模型。與基於穩定擴散的最新模型相比,我們的模型效率顯著提高(快10倍以上),並且更準確。我們提供不同規模的模型(參數範圍從25M到1.3B)以支持廣泛的場景。由於它們強大的泛化能力,我們將它們微調為具有度量深度標籤的度量深度模型。除了我們的模型之外,考慮到當前測試集的有限多樣性和頻繁噪音,我們構建了一個多功能的評估基準,具有精確標註和多樣場景,以促進未來研究。
English
This work presents Depth Anything V2. Without pursuing fancy techniques, we
aim to reveal crucial findings to pave the way towards building a powerful
monocular depth estimation model. Notably, compared with V1, this version
produces much finer and more robust depth predictions through three key
practices: 1) replacing all labeled real images with synthetic images, 2)
scaling up the capacity of our teacher model, and 3) teaching student models
via the bridge of large-scale pseudo-labeled real images. Compared with the
latest models built on Stable Diffusion, our models are significantly more
efficient (more than 10x faster) and more accurate. We offer models of
different scales (ranging from 25M to 1.3B params) to support extensive
scenarios. Benefiting from their strong generalization capability, we fine-tune
them with metric depth labels to obtain our metric depth models. In addition to
our models, considering the limited diversity and frequent noise in current
test sets, we construct a versatile evaluation benchmark with precise
annotations and diverse scenes to facilitate future research.Summary
AI-Generated Summary