ChatPaper.aiChatPaper

DSI-Bench:動態空間智能基準測試平台

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

October 21, 2025
作者: Ziang Zhang, Zehan Wang, Guanghao Zhang, Weilong Dai, Yan Xia, Ziang Yan, Minjie Hong, Zhou Zhao
cs.AI

摘要

對動態空間關係的推理至關重要,因為觀察者與物體常同時移動。儘管視覺語言模型(VLMs)及視覺專家模型在二維任務與靜態場景中表現卓越,它們對動態三維場景的全面理解能力仍顯不足。我們引入動態空間智能,並提出DSI-Bench,這是一個包含近千個動態視頻及超過1700個手動註釋問題的基準,涵蓋了觀察者與物體的九種解耦運動模式。空間與時間上的對稱設計減少了偏差,使得對模型自我運動與物體運動推理能力的系統評估成為可能。我們對14個VLMs及專家模型的評估揭示了關鍵限制:模型常混淆觀察者與物體的運動,展現出語義偏見,並在動態場景中難以準確推斷相對關係。我們的DSI-Bench為具備動態空間智能的通用及專家模型的未來發展提供了寶貴的發現與洞見。
English
Reasoning about dynamic spatial relationships is essential, as both observers and objects often move simultaneously. Although vision-language models (VLMs) and visual expertise models excel in 2D tasks and static scenarios, their ability to fully understand dynamic 3D scenarios remains limited. We introduce Dynamic Spatial Intelligence and propose DSI-Bench, a benchmark with nearly 1,000 dynamic videos and over 1,700 manually annotated questions covering nine decoupled motion patterns of observers and objects. Spatially and temporally symmetric designs reduce biases and enable systematic evaluation of models' reasoning about self-motion and object motion. Our evaluation of 14 VLMs and expert models reveals key limitations: models often conflate observer and object motion, exhibit semantic biases, and fail to accurately infer relative relationships in dynamic scenarios. Our DSI-Bench provides valuable findings and insights about the future development of general and expertise models with dynamic spatial intelligence.
PDF72October 22, 2025