ChatPaper.aiChatPaper

城市场景理解的3D问答

3D Question Answering for City Scene Understanding

July 24, 2024
作者: Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu
cs.AI

摘要

3D多模态问答(MQA)在场景理解中发挥关键作用,使智能体能够在3D环境中理解周围环境。虽然现有研究主要集中在室内家庭任务和室外道路自主驾驶任务上,但对于城市级别场景理解任务的探索有限。此外,现有研究在理解城市场景方面面临挑战,因为缺乏城市级别的空间语义信息和人-环境交互信息。为了解决这些挑战,我们从数据集和方法两个角度研究3D MQA。从数据集角度,我们引入了一个名为City-3DQA的新颖3D MQA数据集,用于城市级别场景理解,这是第一个在城市中融入场景语义和人-环境交互任务的数据集。从方法角度,我们提出了一种名为场景图增强城市级理解方法(Sg-CityU),利用场景图引入空间语义。我们报告了一个新的基准,我们提出的Sg-CityU在City-3DQA的不同设置中分别达到了63.94%和63.76%的准确率。与室内3D MQA方法和使用先进大型语言模型(LLMs)的零样本相比,Sg-CityU在鲁棒性和泛化性能方面展现出最先进的表现。
English
3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks and outdoor roadside autonomous driving tasks, there has been limited exploration of city-level scene understanding tasks. Furthermore, existing research faces challenges in understanding city scenes, due to the absence of spatial semantic information and human-environment interaction information at the city level.To address these challenges, we investigate 3D MQA from both dataset and method perspectives. From the dataset perspective, we introduce a novel 3D MQA dataset named City-3DQA for city-level scene understanding, which is the first dataset to incorporate scene semantic and human-environment interactive tasks within the city. From the method perspective, we propose a Scene graph enhanced City-level Understanding method (Sg-CityU), which utilizes the scene graph to introduce the spatial semantic. A new benchmark is reported and our proposed Sg-CityU achieves accuracy of 63.94 % and 63.76 % in different settings of City-3DQA. Compared to indoor 3D MQA methods and zero-shot using advanced large language models (LLMs), Sg-CityU demonstrates state-of-the-art (SOTA) performance in robustness and generalization.

Summary

AI-Generated Summary

PDF225November 28, 2024