DSI-Bench:动态空间智能基准测试平台
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
October 21, 2025
作者: Ziang Zhang, Zehan Wang, Guanghao Zhang, Weilong Dai, Yan Xia, Ziang Yan, Minjie Hong, Zhou Zhao
cs.AI
摘要
动态空间关系的推理至关重要,因为观察者和物体常常同时移动。尽管视觉-语言模型(VLMs)和视觉专家模型在二维任务和静态场景中表现出色,但它们全面理解动态三维场景的能力仍显不足。我们提出了动态空间智能,并引入了DSI-Bench,这是一个包含近1000个动态视频和超过1700个手工标注问题的基准,涵盖了观察者与物体的九种解耦运动模式。空间和时间上的对称设计减少了偏差,使得模型对自身运动和物体运动的推理能力能够被系统评估。我们对14个VLMs和专家模型的评估揭示了关键局限:模型常混淆观察者与物体的运动,表现出语义偏差,且在动态场景中难以准确推断相对关系。我们的DSI-Bench为未来具备动态空间智能的通用模型和专家模型的发展提供了宝贵的发现与洞见。
English
Reasoning about dynamic spatial relationships is essential, as both observers
and objects often move simultaneously. Although vision-language models (VLMs)
and visual expertise models excel in 2D tasks and static scenarios, their
ability to fully understand dynamic 3D scenarios remains limited. We introduce
Dynamic Spatial Intelligence and propose DSI-Bench, a benchmark with nearly
1,000 dynamic videos and over 1,700 manually annotated questions covering nine
decoupled motion patterns of observers and objects. Spatially and temporally
symmetric designs reduce biases and enable systematic evaluation of models'
reasoning about self-motion and object motion. Our evaluation of 14 VLMs and
expert models reveals key limitations: models often conflate observer and
object motion, exhibit semantic biases, and fail to accurately infer relative
relationships in dynamic scenarios. Our DSI-Bench provides valuable findings
and insights about the future development of general and expertise models with
dynamic spatial intelligence.