어떤 깊이든 추출: 지식 증류가 더 강력한 단안 깊이 추정기를 만든다

초록

단안 깊이 추정(Monocular Depth Estimation, MDE)은 단일 RGB 이미지로부터 장면 깊이를 예측하는 것을 목표로 하며, 3D 장면 이해에서 중요한 역할을 합니다. 최근의 제로샷 MDE 연구는 정규화된 깊이 표현과 증류 기반 학습을 활용하여 다양한 장면에서의 일반화 성능을 향상시키고 있습니다. 그러나 현재의 깊이 정규화 방법은 전역 정규화에 의존함으로써 잡음이 포함된 의사 레이블을 증폭시킬 수 있어 증류 효과를 감소시키는 문제가 있습니다. 본 논문에서는 의사 레이블 증류에 대한 다양한 깊이 정규화 전략의 영향을 체계적으로 분석합니다. 이를 바탕으로, 우리는 전역 및 지역 깊이 단서를 통합하여 의사 레이블 품질을 향상시키는 Cross-Context Distillation을 제안합니다. 또한, 서로 다른 깊이 추정 모델의 상호 보완적 강점을 활용하는 멀티 티처 증류 프레임워크를 도입함으로써 더욱 견고하고 정확한 깊이 예측을 가능하게 합니다. 벤치마크 데이터셋에서의 광범위한 실험을 통해, 우리의 접근 방식이 양적 및 질적으로 최신 방법들을 크게 능가함을 입증합니다.

English

Monocular depth estimation (MDE) aims to predict scene depth from a single RGB image and plays a crucial role in 3D scene understanding. Recent advances in zero-shot MDE leverage normalized depth representations and distillation-based learning to improve generalization across diverse scenes. However, current depth normalization methods for distillation, relying on global normalization, can amplify noisy pseudo-labels, reducing distillation effectiveness. In this paper, we systematically analyze the impact of different depth normalization strategies on pseudo-label distillation. Based on our findings, we propose Cross-Context Distillation, which integrates global and local depth cues to enhance pseudo-label quality. Additionally, we introduce a multi-teacher distillation framework that leverages complementary strengths of different depth estimation models, leading to more robust and accurate depth predictions. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.

어떤 깊이든 추출: 지식 증류가 더 강력한 단안 깊이 추정기를 만든다

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

초록

Support