Co-SemDepth：航空影像的快速联合语义分割与深度估计

摘要

理解场景的几何与语义特性对于自主导航至关重要，尤其是在无人机（UAV）导航中，这一任务尤为复杂。此类信息可通过估计周围环境的深度图与语义分割图来获取，而为了在自主导航中实际应用，这一过程必须尽可能接近实时完成。本文中，我们利用空中机器人搭载的单目相机，在低空非结构化环境中预测深度与语义图。我们提出了一种联合深度学习架构，能够准确且快速地执行这两项任务，并在MidAir和Aeroscapes基准数据集上验证了其有效性。我们的联合架构在任务执行速度上表现出色，在单个NVIDIA Quadro P5000 GPU上预测速度达到20.2帧每秒，且内存占用低，其性能与单一或其它联合架构方法相比具有竞争力或更优。所有训练与预测代码可通过以下链接获取：https://github.com/Malga-Vision/Co-SemDepth。

English

Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth