「咖啡館入口看起來是否方便進出？門在哪裡？」邁向地理空間AI代理的視覺查詢

摘要

互動式數位地圖已徹底改變了人們旅行和認識世界的方式；然而，它們依賴於地理資訊系統（GIS）資料庫中預先存在的結構化數據（例如，道路網絡、興趣點索引），這限制了它們處理與世界外觀相關的地理視覺問題的能力。我們提出了地理視覺代理（Geo-Visual Agents）的願景——這是一種多模態人工智慧代理，能夠通過分析大規模的地理空間圖像庫（包括街景圖像，如Google街景）、基於地點的照片（如TripAdvisor、Yelp）以及航空影像（如衛星照片），並結合傳統的GIS數據源，來理解和回應關於世界的細膩視覺空間查詢。我們定義了這一願景，描述了感知與互動的方法，提供了三個範例，並列舉了未來工作的關鍵挑戰與機遇。

English

Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.

「咖啡館入口看起來是否方便進出？門在哪裡？」邁向地理空間AI代理的視覺查詢

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

摘要

Support