ChatPaper.aiChatPaper

“咖啡馆入口看起来是否方便进入?门在哪里?”——迈向面向视觉查询的地理空间AI智能体

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

August 21, 2025
作者: Jon E. Froehlich, Jared Hwang, Zeyu Wang, John S. O'Meara, Xia Su, William Huang, Yang Zhang, Alex Fiannaca, Philip Nelson, Shaun Kane
cs.AI

摘要

交互式数字地图彻底改变了人们出行和认知世界的方式;然而,它们依赖于地理信息系统(GIS)数据库中预先构建的结构化数据(如道路网络、兴趣点索引),这限制了其解答有关世界面貌的地理可视化问题的能力。我们提出了地理视觉智能体(Geo-Visual Agents)的构想——这是一种多模态人工智能体,能够通过分析大规模地理空间图像库(包括街景图如谷歌街景、地点照片如TripAdvisor和Yelp上的图片、以及航拍影像如卫星照片)并结合传统GIS数据源,来理解并回应关于世界的细致视觉空间查询。我们阐述了这一愿景,描述了感知与交互方法,提供了三个示例,并列举了未来研究中的关键挑战与机遇。
English
Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.
PDF72August 22, 2025