ChatPaper.aiChatPaper

空间树:多模态大模型中的空间能力如何分叉生长 注:标题中的"Branch Out"采用双关手法,既呼应了"树"的意象,又暗喻能力的拓展。中文翻译通过"分叉生长"既保留了植物隐喻,又准确传达了能力扩展的含义。

SpatialTree: How Spatial Abilities Branch Out in MLLMs

December 23, 2025
作者: Yuxi Xiao, Longfei Li, Shen Yan, Xinhang Liu, Sida Peng, Yunchao Wei, Xiaowei Zhou, Bingyi Kang
cs.AI

摘要

认知科学表明,空间能力呈递进式发展——从感知到推理再到交互。然而在多模态大语言模型(MLLM)中,这种层次结构仍未被充分理解,现有研究多聚焦于有限任务范畴。我们提出受认知科学启发的SpatialTree层次框架,将空间能力划分为四个层级:低阶感知(L1)、心理映射(L2)、模拟推演(L3)和具身交互(L4)。基于此分类体系,我们构建了首个以能力为中心的层次化基准,系统评估了主流MLLM在27项子能力上的表现。评估结果揭示出清晰的结构特征:L1技能基本相互独立,而高阶技能呈现强相关性,表明能力间依赖度逐级增强。通过定向监督微调,我们发现了有趣的迁移动态——L1内部存在负迁移现象,但从低阶到高阶能力存在显著的跨级正向迁移与协同效应。最后我们探索了全层次能力提升路径:发现单纯鼓励长链"思考"的强化学习(RL)并不可靠,虽能提升复杂推理却会损害直觉感知。我们提出一种简单的自动思考调控策略,通过抑制冗余推演使RL能持续提升所有层级性能。SpatialTree的建立为理解并系统化扩展MLLM空间能力提供了概念验证框架。
English
Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood, as most studies focus on a narrow set of tasks. We introduce SpatialTree, a cognitive-science-inspired hierarchy that organizes spatial abilities into four levels: low-level perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4). Based on this taxonomy, we construct the first capability-centric hierarchical benchmark, thoroughly evaluating mainstream MLLMs across 27 sub-abilities. The evaluation results reveal a clear structure: L1 skills are largely orthogonal, whereas higher-level skills are strongly correlated, indicating increasing interdependency. Through targeted supervised fine-tuning, we uncover a surprising transfer dynamic-negative transfer within L1, but strong cross-level transfer from low- to high-level abilities with notable synergy. Finally, we explore how to improve the entire hierarchy. We find that naive RL that encourages extensive "thinking" is unreliable: it helps complex reasoning but hurts intuitive perception. We propose a simple auto-think strategy that suppresses unnecessary deliberation, enabling RL to consistently improve performance across all levels. By building SpatialTree, we provide a proof-of-concept framework for understanding and systematically scaling spatial abilities in MLLMs.
PDF342December 25, 2025