深度任意 V2
Depth Anything V2
June 13, 2024
作者: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
cs.AI
摘要
本文介绍了Depth Anything V2。在不追求花哨技术的情况下,我们旨在揭示关键发现,为构建强大的单目深度估计模型铺平道路。值得注意的是,与V1相比,这个版本通过三个关键实践产生了更精细、更稳健的深度预测:1)用合成图像替换所有标记的真实图像,2)扩大我们教师模型的容量,3)通过大规模伪标记的真实图像桥接教授学生模型。与基于稳定扩散的最新模型相比,我们的模型效率显著提高(快速超过10倍),并且更准确。我们提供不同规模的模型(参数范围从25M到1.3B)以支持广泛场景。由于它们强大的泛化能力,我们通过度量深度标签微调它们,获得我们的度量深度模型。除了我们的模型,考虑到当前测试集的有限多样性和频繁噪声,我们构建了一个多功能评估基准,具有精确注释和多样场景,以促进未来研究。
English
This work presents Depth Anything V2. Without pursuing fancy techniques, we
aim to reveal crucial findings to pave the way towards building a powerful
monocular depth estimation model. Notably, compared with V1, this version
produces much finer and more robust depth predictions through three key
practices: 1) replacing all labeled real images with synthetic images, 2)
scaling up the capacity of our teacher model, and 3) teaching student models
via the bridge of large-scale pseudo-labeled real images. Compared with the
latest models built on Stable Diffusion, our models are significantly more
efficient (more than 10x faster) and more accurate. We offer models of
different scales (ranging from 25M to 1.3B params) to support extensive
scenarios. Benefiting from their strong generalization capability, we fine-tune
them with metric depth labels to obtain our metric depth models. In addition to
our models, considering the limited diversity and frequent noise in current
test sets, we construct a versatile evaluation benchmark with precise
annotations and diverse scenes to facilitate future research.Summary
AI-Generated Summary