Co-SemDepth:航拍圖像上的快速聯合語義分割與深度估計
Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
March 23, 2025
作者: Yara AlaaEldin, Francesca Odone
cs.AI
摘要
理解場景的幾何與語義特性在自主導航中至關重要,尤其是在無人機(UAV)導航的情況下,這項任務尤具挑戰性。此類資訊可通過估計周圍環境的深度與語義分割圖來獲取,而為了在自主導航中實際應用,這一過程必須盡可能接近即時完成。本文中,我們利用空中機器人上的單目相機來預測低空非結構化環境中的深度與語義圖。我們提出了一種聯合深度學習架構,能夠準確且迅速地執行這兩項任務,並在MidAir與Aeroscapes基準數據集上驗證了其有效性。我們的聯合架構在執行任務時,展現出與其他單一及聯合架構方法相當或更優的競爭力,同時在單個NVIDIA Quadro P5000 GPU上實現了20.2 FPS的快速預測,且具有較低的記憶體佔用。所有用於訓練與預測的代碼均可在此連結找到:https://github.com/Malga-Vision/Co-SemDepth。
English
Understanding the geometric and semantic properties of the scene is crucial
in autonomous navigation and particularly challenging in the case of Unmanned
Aerial Vehicle (UAV) navigation. Such information may be by obtained by
estimating depth and semantic segmentation maps of the surrounding environment
and for their practical use in autonomous navigation, the procedure must be
performed as close to real-time as possible. In this paper, we leverage
monocular cameras on aerial robots to predict depth and semantic maps in
low-altitude unstructured environments. We propose a joint deep-learning
architecture that can perform the two tasks accurately and rapidly, and
validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our
joint-architecture proves to be competitive or superior to the other single and
joint architecture methods while performing its task fast predicting 20.2 FPS
on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All
codes for training and prediction can be found on this link:
https://github.com/Malga-Vision/Co-SemDepthSummary
AI-Generated Summary