Co-SemDepth：航拍圖像上的快速聯合語義分割與深度估計

摘要

理解場景的幾何與語義特性在自主導航中至關重要，尤其是在無人機（UAV）導航的情況下，這項任務尤具挑戰性。此類資訊可通過估計周圍環境的深度與語義分割圖來獲取，而為了在自主導航中實際應用，這一過程必須盡可能接近即時完成。本文中，我們利用空中機器人上的單目相機來預測低空非結構化環境中的深度與語義圖。我們提出了一種聯合深度學習架構，能夠準確且迅速地執行這兩項任務，並在MidAir與Aeroscapes基準數據集上驗證了其有效性。我們的聯合架構在執行任務時，展現出與其他單一及聯合架構方法相當或更優的競爭力，同時在單個NVIDIA Quadro P5000 GPU上實現了20.2 FPS的快速預測，且具有較低的記憶體佔用。所有用於訓練與預測的代碼均可在此連結找到：https://github.com/Malga-Vision/Co-SemDepth。

English

Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth

Co-SemDepth：航拍圖像上的快速聯合語義分割與深度估計

Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images

摘要

Support