GeoRanker: Afstandsbewuste rangschikking voor wereldwijde beeldgeolokalisatie

Samenvatting

Wereldwijde beeldgeolokalisatie - de taak om GPS-coördinaten te voorspellen aan de hand van afbeeldingen die overal op aarde zijn genomen - vormt een fundamentele uitdaging vanwege de enorme diversiteit in visuele inhoud tussen regio's. Hoewel recente benaderingen een tweestappenpijplijn hanteren waarbij kandidaten worden opgehaald en de beste match wordt geselecteerd, vertrouwen ze doorgaans op simplistische gelijkenisheuristieken en puntgewijze supervisie, waardoor ze er niet in slagen ruimtelijke relaties tussen kandidaten te modelleren. In dit artikel stellen we GeoRanker voor, een afstandsbewust rangschikkingsraamwerk dat grote vision-language-modellen benut om query-kandidaatinteracties gezamenlijk te coderen en geografische nabijheid te voorspellen. Daarnaast introduceren we een multi-order afstandsverlies dat zowel absolute als relatieve afstanden rangschikt, waardoor het model gestructureerde ruimtelijke relaties kan redeneren. Om dit te ondersteunen, hebben we GeoRanking samengesteld, de eerste dataset die expliciet is ontworpen voor geografische rangschikkingstaken met multimodale kandidaatinformatie. GeoRanker behaalt state-of-the-art resultaten op twee gevestigde benchmarks (IM2GPS3K en YFCC4K) en overtreft de huidige beste methoden aanzienlijk.

English

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

GeoRanker: Afstandsbewuste rangschikking voor wereldwijde beeldgeolokalisatie

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Samenvatting

Support