ChatPaper.aiChatPaper

思維導圖:強化型並行地圖增強代理在幾何定位中的應用

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

January 8, 2026
作者: Yuxiang Ji, Yong Wang, Ziyu Ma, Yiming Hu, Hailang Huang, Xuecai Hu, Guanhua Chen, Liaoni Wu, Xiangxiang Chu
cs.AI

摘要

圖像地理定位任務旨在利用視覺線索預測圖像在地球上任意位置的拍攝地點。現有的大型視覺語言模型方法雖能運用世界知識、鏈式思考推理與智能體能力,卻忽略了人類常用的策略——使用地圖。本研究首創性地為模型賦予「地圖思維」能力,將其構建為「地圖內智能體循環」框架。我們為此開發了兩階段優化方案:先進行智能體強化學習,再實施平行測試時擴展。強化學習能增強模型的智能體能力以提升採樣效率,平行測試時擴展則使模型能在最終預測前探索多條候選路徑,這對地理定位至關重要。為使用最新真實世界圖像評估方法,我們進一步提出MAPBench——一個完全由真實圖像構成的綜合性地理定位訓練與評估基準。實驗結果表明,我們的方法在多數指標上超越現有開源與閉源模型,相較於具備谷歌搜索/地圖聯網功能的Gemini-3-Pro模型,更將Acc@500米精度從8.0%顯著提升至22.1%。
English
The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage world knowledge, chain-of-thought reasoning, and agentic capabilities, but overlook a common strategy used by humans -- using maps. In this work, we first equip the model Thinking with Map ability and formulate it as an agent-in-the-map loop. We develop a two-stage optimization scheme for it, including agentic reinforcement learning (RL) followed by parallel test-time scaling (TTS). The RL strengthens the agentic capability of model to improve sampling efficiency, and the parallel TTS enables the model to explore multiple candidate paths before making the final prediction, which is crucial for geolocalization. To evaluate our method on up-to-date and in-the-wild images, we further present MAPBench, a comprehensive geolocalization training and evaluation benchmark composed entirely of real-world images. Experimental results show that our method outperforms existing open- and closed-source models on most metrics, specifically improving Acc@500m from 8.0\% to 22.1\% compared to Gemini-3-Pro with Google Search/Map grounded mode.
PDF1293January 13, 2026