ChatPaper.aiChatPaper

GeoRanker:面向全球图像地理定位的距离感知排序算法

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

May 19, 2025
作者: Pengyue Jia, Seongheon Park, Song Gao, Xiangyu Zhao, Yixuan Li
cs.AI

摘要

全球图像地理定位——即从地球上任何地方拍摄的图像中预测GPS坐标的任务——由于各地区视觉内容的巨大差异,构成了一个根本性的挑战。尽管近期方法采用了两阶段流程,即先检索候选位置再选择最佳匹配,但它们通常依赖于简单的相似性启发式方法和点对点监督,未能有效建模候选位置间的空间关系。本文提出GeoRanker,一种距离感知的排序框架,它利用大规模视觉-语言模型联合编码查询与候选位置间的交互,并预测地理邻近度。此外,我们引入了一种多阶距离损失函数,该函数同时排序绝对距离和相对距离,使模型能够推理结构化空间关系。为此,我们精心构建了GeoRanking,这是首个专为地理排序任务设计、包含多模态候选信息的数据集。GeoRanker在两个公认的基准测试(IM2GPS3K和YFCC4K)上取得了最先进的成果,显著超越了当前最佳方法。
English
Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

Summary

AI-Generated Summary

PDF22May 21, 2025