GeoRanker:面向全球圖像地理定位的距離感知排序系統
GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization
May 19, 2025
作者: Pengyue Jia, Seongheon Park, Song Gao, Xiangyu Zhao, Yixuan Li
cs.AI
摘要
全球图像地理定位——即从地球上任何地方拍摄的图像中预测GPS坐标的任务——由于各地区视觉内容的巨大多样性而面临根本性挑战。尽管最近的方法采用了两阶段流程,即检索候选位置并选择最佳匹配,但它们通常依赖于简单的相似性启发式和点对点的监督,未能建模候选位置之间的空间关系。在本文中,我们提出了GeoRanker,一种距离感知的排序框架,该框架利用大规模视觉-语言模型联合编码查询与候选位置的交互,并预测地理邻近度。此外,我们引入了一种多阶距离损失,该损失同时排序绝对距离和相对距离,使模型能够推理结构化的空间关系。为此,我们构建了GeoRanking,这是首个专门为多模态候选信息的地理排序任务设计的数据集。GeoRanker在两个公认的基准测试(IM2GPS3K和YFCC4K)上取得了最先进的结果,显著优于当前的最佳方法。
English
Worldwide image geolocalization-the task of predicting GPS coordinates from
images taken anywhere on Earth-poses a fundamental challenge due to the vast
diversity in visual content across regions. While recent approaches adopt a
two-stage pipeline of retrieving candidates and selecting the best match, they
typically rely on simplistic similarity heuristics and point-wise supervision,
failing to model spatial relationships among candidates. In this paper, we
propose GeoRanker, a distance-aware ranking framework that leverages large
vision-language models to jointly encode query-candidate interactions and
predict geographic proximity. In addition, we introduce a multi-order distance
loss that ranks both absolute and relative distances, enabling the model to
reason over structured spatial relationships. To support this, we curate
GeoRanking, the first dataset explicitly designed for geographic ranking tasks
with multimodal candidate information. GeoRanker achieves state-of-the-art
results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly
outperforming current best methods.Summary
AI-Generated Summary