无处不在的深度:通过透视蒸馏和未标记数据增强提升360度单眼深度估计
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation
June 18, 2024
作者: Ning-Hsu Wang, Yu-Lun Liu
cs.AI
摘要
在360度图像中准确估计深度对虚拟现实、自主导航和沉浸式媒体应用至关重要。现有为透视视角图像设计的深度估计方法在应用到360度图像时失败,原因在于不同的摄像机投影和失真,而360度方法由于缺乏标记数据对表现不佳。我们提出了一种新的深度估计框架,有效利用未标记的360度数据。我们的方法利用最先进的透视深度估计模型作为教师模型,通过六面立方体投影技术生成伪标签,实现对360度图像深度的高效标记。该方法利用了大型数据集日益增加的可用性。我们的方法包括两个主要阶段:离线生成无效区域的蒙版和在线半监督联合训练制度。我们在Matterport3D和Stanford2D3D等基准数据集上测试了我们的方法,显示出深度估计准确性显著提高,特别是在零样本场景中。我们提出的训练流程可以增强任何360度单眼深度估计器,并展示了在不同摄像机投影和数据类型之间有效的知识转移。请查看我们的项目页面获取结果:https://albert100121.github.io/Depth-Anywhere/
English
Accurately estimating depth in 360-degree imagery is crucial for virtual
reality, autonomous navigation, and immersive media applications. Existing
depth estimation methods designed for perspective-view imagery fail when
applied to 360-degree images due to different camera projections and
distortions, whereas 360-degree methods perform inferior due to the lack of
labeled data pairs. We propose a new depth estimation framework that utilizes
unlabeled 360-degree data effectively. Our approach uses state-of-the-art
perspective depth estimation models as teacher models to generate pseudo labels
through a six-face cube projection technique, enabling efficient labeling of
depth in 360-degree images. This method leverages the increasing availability
of large datasets. Our approach includes two main stages: offline mask
generation for invalid regions and an online semi-supervised joint training
regime. We tested our approach on benchmark datasets such as Matterport3D and
Stanford2D3D, showing significant improvements in depth estimation accuracy,
particularly in zero-shot scenarios. Our proposed training pipeline can enhance
any 360 monocular depth estimator and demonstrates effective knowledge transfer
across different camera projections and data types. See our project page for
results: https://albert100121.github.io/Depth-Anywhere/Summary
AI-Generated Summary