Depth Anything: Liberando el Poder de los Datos No Etiquetados a Gran Escala

Resumen

Este trabajo presenta Depth Anything, una solución altamente práctica para la estimación robusta de profundidad monocular. Sin perseguir módulos técnicos novedosos, nuestro objetivo es construir un modelo base simple pero potente que maneje cualquier imagen en cualquier circunstancia. Para ello, ampliamos el conjunto de datos diseñando un motor de datos para recopilar y anotar automáticamente grandes cantidades de datos no etiquetados (~62M), lo que aumenta significativamente la cobertura de datos y, por tanto, reduce el error de generalización. Investigamos dos estrategias simples pero efectivas que hacen prometedor el escalado de datos. Primero, se crea un objetivo de optimización más desafiante aprovechando herramientas de aumento de datos. Esto obliga al modelo a buscar activamente conocimiento visual adicional y adquirir representaciones robustas. Segundo, se desarrolla una supervisión auxiliar para que el modelo herede ricos conocimientos semánticos previos de codificadores preentrenados. Evaluamos ampliamente sus capacidades de zero-shot, incluyendo seis conjuntos de datos públicos y fotos capturadas aleatoriamente. Demuestra una impresionante capacidad de generalización. Además, al ajustarlo con información de profundidad métrica de NYUv2 y KITTI, se establecen nuevos SOTAs. Nuestro mejor modelo de profundidad también resulta en un ControlNet condicionado por profundidad mejorado. Nuestros modelos están disponibles en https://github.com/LiheYoung/Depth-Anything.

English

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos. It demonstrates impressive generalization ability. Further, through fine-tuning it with metric depth information from NYUv2 and KITTI, new SOTAs are set. Our better depth model also results in a better depth-conditioned ControlNet. Our models are released at https://github.com/LiheYoung/Depth-Anything.

Depth Anything: Liberando el Poder de los Datos No Etiquetados a Gran Escala

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Resumen

Support