OpenCity3D:视觉-语言模型对城市环境了解多少?
OpenCity3D: What do Vision-Language Models know about Urban Environments?
March 21, 2025
作者: Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann
cs.AI
摘要
视觉语言模型(VLMs)在三维场景理解方面展现出巨大潜力,但主要应用于室内空间或自动驾驶领域,侧重于分割等低层次任务。本研究通过利用多视角航拍图像的三维重建技术,将其应用扩展至城市尺度环境。我们提出了OpenCity3D方法,专注于解决高层次任务,如人口密度估计、建筑年代分类、房产价格预测、犯罪率评估及噪声污染评价。研究结果表明,OpenCity3D在零样本和少样本学习上表现卓越,展现了其对新情境的适应能力。此研究为语言驱动的城市分析确立了新范式,为规划、政策制定及环境监测等领域开辟了应用前景。访问我们的项目页面:opencity3d.github.io。
English
Vision-language models (VLMs) show great promise for 3D scene understanding
but are mainly applied to indoor spaces or autonomous driving, focusing on
low-level tasks like segmentation. This work expands their use to urban-scale
environments by leveraging 3D reconstructions from multi-view aerial imagery.
We propose OpenCity3D, an approach that addresses high-level tasks, such as
population density estimation, building age classification, property price
prediction, crime rate assessment, and noise pollution evaluation. Our findings
highlight OpenCity3D's impressive zero-shot and few-shot capabilities,
showcasing adaptability to new contexts. This research establishes a new
paradigm for language-driven urban analytics, enabling applications in
planning, policy, and environmental monitoring. See our project page:
opencity3d.github.ioSummary
AI-Generated Summary