Co-SemDepth：航空画像における高速なセマンティックセグメンテーションと深度推定の同時実行

要旨

シーンの幾何学的および意味的特性を理解することは、自律ナビゲーションにおいて極めて重要であり、特に無人航空機（UAV）のナビゲーションにおいては非常に困難な課題です。このような情報は、周囲環境の深度とセマンティックセグメンテーションマップを推定することで得ることができ、自律ナビゲーションにおける実用的な利用のためには、その処理を可能な限りリアルタイムに近い形で行う必要があります。本論文では、低高度の非構造化環境において、単眼カメラを搭載した空中ロボットを用いて深度とセマンティックマップを予測する手法を提案します。我々は、これら2つのタスクを正確かつ迅速に実行できる統合型ディープラーニングアーキテクチャを提案し、MidAirおよびAeroscapesベンチマークデータセットを用いてその有効性を検証します。提案する統合アーキテクチャは、他の単一および統合アーキテクチャ手法と比較して競争力があり、場合によっては優れていることを示しつつ、単一のNVIDIA Quadro P5000 GPU上で20.2 FPSの高速予測を実現し、メモリ使用量も少ないことが確認されました。トレーニングおよび予測のための全てのコードは、以下のリンクから入手可能です：https://github.com/Malga-Vision/Co-SemDepth

English

Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth

Co-SemDepth：航空画像における高速なセマンティックセグメンテーションと深度推定の同時実行

Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images

要旨

Support