ChatPaper.aiChatPaper

深度无监督学习:释放大规模未标记数据的潜力

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

January 19, 2024
作者: Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
cs.AI

摘要

本文介绍了Depth Anything,这是一种用于稳健单目深度估计的高度实用解决方案。我们的目标不是追求新颖的技术模块,而是构建一个简单而强大的基础模型,可以处理任何图像在任何情况下。为此,我们通过设计一个数据引擎来扩大数据集,收集并自动标注大规模未标记数据(约62M),从而显著扩大数据覆盖范围,进而能够减少泛化误差。我们研究了两种简单而有效的策略,使数据扩展变得有前途。首先,通过利用数据增强工具创建一个更具挑战性的优化目标。这迫使模型积极寻求额外的视觉知识并获得稳健的表示。其次,开发了辅助监督,强制模型从预训练编码器那里继承丰富的语义先验。我们广泛评估了其零样本能力,包括六个公共数据集和随机拍摄的照片。它展示了令人印象深刻的泛化能力。此外,通过使用来自NYUv2和KITTI的度量深度信息对其进行微调,我们创造了新的SOTAs。我们更好的深度模型还导致更好的深度条件控制网络。我们的模型发布在https://github.com/LiheYoung/Depth-Anything。
English
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos. It demonstrates impressive generalization ability. Further, through fine-tuning it with metric depth information from NYUv2 and KITTI, new SOTAs are set. Our better depth model also results in a better depth-conditioned ControlNet. Our models are released at https://github.com/LiheYoung/Depth-Anything.
PDF622December 15, 2024