OpenCity3D：視覺語言模型對城市環境了解多少？

摘要

視覺語言模型（VLMs）在3D場景理解方面展現出巨大潛力，但主要應用於室內空間或自動駕駛領域，專注於分割等低層次任務。本研究通過利用多視角航拍影像的3D重建，將其應用擴展至城市規模環境。我們提出了OpenCity3D，一種針對高層次任務的方法，如人口密度估計、建築年代分類、房價預測、犯罪率評估及噪音污染評價。我們的研究結果凸顯了OpenCity3D在零樣本和少樣本學習上的卓越能力，展示了其對新情境的適應性。此研究為語言驅動的城市分析建立了新範式，促進了其在規劃、政策制定及環境監測中的應用。詳見我們的項目頁面：opencity3d.github.io。

English

Vision-language models (VLMs) show great promise for 3D scene understanding but are mainly applied to indoor spaces or autonomous driving, focusing on low-level tasks like segmentation. This work expands their use to urban-scale environments by leveraging 3D reconstructions from multi-view aerial imagery. We propose OpenCity3D, an approach that addresses high-level tasks, such as population density estimation, building age classification, property price prediction, crime rate assessment, and noise pollution evaluation. Our findings highlight OpenCity3D's impressive zero-shot and few-shot capabilities, showcasing adaptability to new contexts. This research establishes a new paradigm for language-driven urban analytics, enabling applications in planning, policy, and environmental monitoring. See our project page: opencity3d.github.io

OpenCity3D：視覺語言模型對城市環境了解多少？

OpenCity3D: What do Vision-Language Models know about Urban Environments?

摘要

Support