MegaLoc:一检索定位全場景
MegaLoc: One Retrieval to Place Them All
February 24, 2025
作者: Gabriele Berton, Carlo Masone
cs.AI
摘要
從與給定查詢相同的位置檢索影像是多項計算機視覺任務的重要組成部分,例如視覺地點識別、地標檢索、視覺定位、三維重建以及同步定位與地圖構建(SLAM)。然而,現有的解決方案通常專為其中某一任務設計,當需求稍有變化或遇到分佈外數據時,這些方法往往表現不佳。本文中,我們綜合了多種現有方法、訓練技術和數據集,訓練了一個名為MegaLoc的檢索模型,該模型在多項任務上均表現出色。我們發現,MegaLoc(1)在多個視覺地點識別數據集上達到了最先進水平,(2)在常見的地標檢索數據集上取得了令人印象深刻的結果,以及(3)在LaMAR數據集的視覺定位任務中,僅通過將現有定位流程中的檢索方法替換為MegaLoc,便創下了新的最佳記錄。MegaLoc的代碼已公開於https://github.com/gmberton/MegaLoc。
English
Retrieving images from the same location as a given query is an important
component of multiple computer vision tasks, like Visual Place Recognition,
Landmark Retrieval, Visual Localization, 3D reconstruction, and SLAM. However,
existing solutions are built to specifically work for one of these tasks, and
are known to fail when the requirements slightly change or when they meet
out-of-distribution data. In this paper we combine a variety of existing
methods, training techniques, and datasets to train a retrieval model, called
MegaLoc, that is performant on multiple tasks. We find that MegaLoc (1)
achieves state of the art on a large number of Visual Place Recognition
datasets, (2) impressive results on common Landmark Retrieval datasets, and (3)
sets a new state of the art for Visual Localization on the LaMAR datasets,
where we only changed the retrieval method to the existing localization
pipeline. The code for MegaLoc is available at
https://github.com/gmberton/MegaLocSummary
AI-Generated Summary