SpaceDG：視覺退化下的空間智能基準測試

摘要

多模態大語言模型（MLLMs）在空間智能方面取得了快速進展，然而現有的空間推理基準大多假設輸入為完美視覺，忽略了現實部署中常見的退化現象，例如動態模糊、低光照、惡劣天氣、鏡頭畸變和壓縮偽影。這引發了一個根本性問題：當視覺觀測不完美時，當前MLLMs的空間智能有多穩健？為回答此問題，我們提出了SpaceDG——首個大規模的感知退化空間理解數據集。該數據集基於物理驅動的退化合成引擎構建，將退化形成過程嵌入3D高斯潑濺（3DGS）渲染中，從而能真實模擬九種退化類型。最終數據集包含來自近1,000個室內場景的約100萬個問答對。我們進一步引入了SpaceDG-Bench，一個經人工驗證的基準，包含1,102道問題，涵蓋11個推理類別和9種視覺退化類型，提供超過1萬個視覺問答實例。對25個開源與閉源MLLMs的評估顯示，視覺退化會持續且顯著地損害空間推理能力，暴露了關鍵的穩健性差距。最後，我們證明在SpaceDG上進行微調能顯著提升退化穩健性，甚至在退化條件下超越人類表現，且在乾淨圖像上無任何性能下降，凸顯了以感知退化訓練實現穩健空間智能的前景。

English

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.