隨處地圖(MIA):利用大規模公共數據資源賦能鳥瞰式地圖製作
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data
July 11, 2024
作者: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer
cs.AI
摘要
從上至下的鳥瞰(BEV)地圖是地面機器人導航中常見的表示形式,因其豐富性和對下游任務的靈活性而受到歡迎。儘管最近的方法已顯示出從性能,可以從第一人稱視角(FPV)圖像預測BEV地圖,但其泛化能力僅限於當前自動駕駛車輛數據集捕獲的小區域。在這種情況下,我們展示了一種更具規模的通用地圖預測方法,可通過使用兩個大規模眾包地圖平台實現,即Mapillary用於FPV圖像和OpenStreetMap用於BEV語義地圖。我們引入了Map It Anywhere(MIA),這是一個數據引擎,可實現對現有開源地圖平台的標記地圖預測數據的無縫策劃和建模。使用我們的MIA數據引擎,我們展示了自動收集一個包含各種地理、景觀、環境因素、相機型號和拍攝情景的120萬對FPV圖像和BEV地圖數據集的便利性。我們進一步在這些數據上訓練了一個簡單的與相機型號無關的模型,用於BEV地圖預測。使用確立的基準和我們的數據集進行廣泛評估,顯示了由MIA策劃的數據可實現有效的泛化BEV地圖預測的預訓練,其零槍擊性能遠超過基於現有數據集訓練的基準35%。我們的分析突顯了使用大規模公共地圖來開發和測試通用BEV感知的潛力,為更強大的自主導航鋪平了道路。
English
Top-down Bird's Eye View (BEV) maps are a popular representation for ground
robot navigation due to their richness and flexibility for downstream tasks.
While recent methods have shown promise for predicting BEV maps from
First-Person View (FPV) images, their generalizability is limited to small
regions captured by current autonomous vehicle-based datasets. In this context,
we show that a more scalable approach towards generalizable map prediction can
be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary
for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It
Anywhere (MIA), a data engine that enables seamless curation and modeling of
labeled map prediction data from existing open-source map platforms. Using our
MIA data engine, we display the ease of automatically collecting a dataset of
1.2 million pairs of FPV images & BEV maps encompassing diverse geographies,
landscapes, environmental factors, camera models & capture scenarios. We
further train a simple camera model-agnostic model on this data for BEV map
prediction. Extensive evaluations using established benchmarks and our dataset
show that the data curated by MIA enables effective pretraining for
generalizable BEV map prediction, with zero-shot performance far exceeding
baselines trained on existing datasets by 35%. Our analysis highlights the
promise of using large-scale public maps for developing & testing generalizable
BEV perception, paving the way for more robust autonomous navigation.Summary
AI-Generated Summary