ChatPaper.aiChatPaper

随心所欲地绘制地图(MIA):利用大规模公共数据赋能鸟瞰地图绘制

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

July 11, 2024
作者: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer
cs.AI

摘要

基于自上而下的鸟瞰地图是地面机器人导航中常用的表示形式,因其对下游任务的丰富性和灵活性。尽管最近的方法展现了从第一人称视角图像预测鸟瞰地图的潜力,但其泛化能力仅限于当前自动驾驶车辆数据集捕获的小区域。在这种背景下,我们展示了一种更具可扩展性的通用地图预测方法,即利用两个大规模众包地图平台,即Mapillary用于第一人称视角图像,以及OpenStreetMap用于鸟瞰地图语义。我们引入了Map It Anywhere(MIA),这是一个数据引擎,可实现对现有开源地图平台的标记地图预测数据的无缝策划和建模。利用我们的MIA数据引擎,我们展示了自动收集包含不同地理、景观、环境因素、摄像机型号和捕获场景的120万对第一人称视角图像和鸟瞰地图数据集的便利性。我们进一步在这些数据上训练了一个简单的与摄像机型号无关的模型,用于鸟瞰地图预测。使用已建立的基准测试和我们的数据集进行广泛评估,结果显示,MIA策划的数据使得通用鸟瞰地图预测的有效预训练成为可能,零样本性能远远超过现有数据集训练的基线35%。我们的分析突显了利用大规模公共地图开发和测试通用鸟瞰感知的潜力,为更强大的自主导航铺平了道路。
English
Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.

Summary

AI-Generated Summary

PDF114November 28, 2024