Depth Anything: 대규모 비라벨 데이터의 잠재력 발휘

초록

본 연구는 강력한 단안 깊이 추정을 위한 고도로 실용적인 솔루션인 Depth Anything을 소개한다. 새로운 기술 모듈을 추구하기보다, 우리는 어떠한 상황에서도 모든 이미지를 처리할 수 있는 간단하지만 강력한 기반 모델을 구축하는 것을 목표로 한다. 이를 위해, 우리는 대규모의 레이블이 없는 데이터(~62M)를 수집하고 자동으로 주석을 달기 위한 데이터 엔진을 설계하여 데이터셋을 확장하였다. 이는 데이터 커버리지를 크게 늘려 일반화 오류를 줄일 수 있게 한다. 우리는 데이터 확장을 가능하게 하는 두 가지 간단하지만 효과적인 전략을 탐구한다. 첫째, 데이터 증강 도구를 활용하여 더 도전적인 최적화 목표를 생성한다. 이는 모델이 추가적인 시각적 지식을 적극적으로 탐구하고 강력한 표현을 획득하도록 강제한다. 둘째, 사전 훈련된 인코더로부터 풍부한 의미론적 사전 지식을 상속받도록 모델을 강제하는 보조 감독을 개발한다. 우리는 6개의 공개 데이터셋과 무작위로 캡처한 사진을 포함하여 제로샷 능력을 광범위하게 평가한다. 이는 인상적인 일반화 능력을 보여준다. 더 나아가, NYUv2와 KITTI의 미터법 깊이 정보를 사용하여 미세 조정함으로써 새로운 SOTA(State-of-the-Art)를 달성한다. 우리의 더 나은 깊이 모델은 또한 더 나은 깊이 조건부 ControlNet을 결과로 낳는다. 우리의 모델은 https://github.com/LiheYoung/Depth-Anything에서 공개되었다.

English

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos. It demonstrates impressive generalization ability. Further, through fine-tuning it with metric depth information from NYUv2 and KITTI, new SOTAs are set. Our better depth model also results in a better depth-conditioned ControlNet. Our models are released at https://github.com/LiheYoung/Depth-Anything.

Depth Anything: 대규모 비라벨 데이터의 잠재력 발휘

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

초록

Support