SpatialScore:邁向多模態空間理解的統一評估框架
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
May 22, 2025
作者: Haoning Wu, Xiao Huang, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie
cs.AI
摘要
多模態大型語言模型(MLLMs)在問答任務中取得了令人矚目的成功,然而其在空間理解方面的能力卻較少被探討。本研究探討了一個關鍵問題:現有的MLLMs是否具備三維空間感知與理解能力?具體而言,本文做出了以下貢獻:(i) 我們引入了VGBench,這是一個專門設計用於評估MLLMs視覺幾何感知能力的基準,例如相機姿態與運動估計;(ii) 我們提出了SpatialScore,這是迄今為止最全面且多樣化的多模態空間理解基準,它整合了VGBench與來自其他11個現有數據集的相關數據。該基準涵蓋了28,000個樣本,涉及多種空間理解任務、模態及問答格式,並包含一個精心挑選的挑戰性子集SpatialScore-Hard;(iii) 我們開發了SpatialAgent,這是一個新穎的多代理系統,整合了9種專用於空間理解的工具,支持Plan-Execute與ReAct兩種推理範式;(iv) 我們進行了廣泛的評估,揭示了空間推理中持續存在的挑戰,同時展示了SpatialAgent的有效性。我們相信,SpatialScore將為MLLMs的下一階段發展提供寶貴的洞見,並作為一個嚴格的基準。
English
Multimodal large language models (MLLMs) have achieved impressive success in
question-answering tasks, yet their capabilities for spatial understanding are
less explored. This work investigates a critical question: do existing MLLMs
possess 3D spatial perception and understanding abilities? Concretely, we make
the following contributions in this paper: (i) we introduce VGBench, a
benchmark specifically designed to assess MLLMs for visual geometry perception,
e.g., camera pose and motion estimation; (ii) we propose SpatialScore, the most
comprehensive and diverse multimodal spatial understanding benchmark to date,
integrating VGBench with relevant data from the other 11 existing datasets.
This benchmark comprises 28K samples across various spatial understanding
tasks, modalities, and QA formats, along with a carefully curated challenging
subset, SpatialScore-Hard; (iii) we develop SpatialAgent, a novel multi-agent
system incorporating 9 specialized tools for spatial understanding, supporting
both Plan-Execute and ReAct reasoning paradigms; (iv) we conduct extensive
evaluations to reveal persistent challenges in spatial reasoning while
demonstrating the effectiveness of SpatialAgent. We believe SpatialScore will
offer valuable insights and serve as a rigorous benchmark for the next
evolution of MLLMs.Summary
AI-Generated Summary