ChatPaper.aiChatPaper

几何约束型智能体在空间推理中的应用

Geometrically-Constrained Agent for Spatial Reasoning

November 27, 2025
作者: Zeren Chen, Xiaoya Lu, Zhijie Zheng, Pengrui Li, Lehan He, Yijin Zhou, Jing Shao, Bohan Zhuang, Lu Sheng
cs.AI

摘要

视觉语言模型(VLM)在空间推理中存在根本性的语义-几何鸿沟:它们擅长定性语义推断,但其推理过程在存在信息损失的语义空间中进行,与高保真几何空间存在错位。现有范式均未能弥合这一鸿沟。基于训练的方法受困于"预言悖论",从不完美的预言源学习有缺陷的空间逻辑;工具集成方法虽能约束最终计算,但关键问题在于未对VLM的规划过程施加约束,导致产生几何缺陷的规划方案。本研究提出几何约束智能体(GCA),这是一种免训练的智能体范式,通过引入形式化任务约束来解决这一鸿沟。具体而言,我们策略性地将VLM的角色解耦为两个阶段:首先作为语义分析师,将用户的模糊查询转化为可验证的形式化任务约束,该约束明确定义参考系和目标;其次作为任务求解器,在约束定义的确定性边界内严格生成并执行工具调用。这种几何约束推理策略成功解决了语义-几何鸿沟,为空间推理提供了稳健可验证的推理路径。综合实验表明,GCA在多个空间推理基准测试中达到最先进性能,较现有基于训练和工具集成的方法提升约27%。详情请访问我们的项目主页:https://gca-spatial-reasoning.github.io。
English
Vision Language Models (VLMs) exhibit a fundamental semantic-to-geometric gap in spatial reasoning: they excel at qualitative semantic inference but their reasoning operates within a lossy semantic space, misaligned with high-fidelity geometry. Current paradigms fail to bridge this gap. Training-based methods suffer from an ``oracle paradox,'' learning flawed spatial logic from imperfect oracles. Tool-integrated methods constrain the final computation but critically leave the VLM's planning process unconstrained, resulting in geometrically flawed plans. In this work, we propose Geometrically-Constrained Agent (GCA), a training-free agentic paradigm that resolves this gap by introducing a formal task constraint. Specifically, we strategically decouples the VLM's role into two stages. First, acting as a semantic analyst, the VLM translates the user's ambiguous query into the formal, verifiable task constraint, which defines the reference frame and objective. Second, acting as a task solver, the VLM generates and executes tool calls strictly within the deterministic bounds defined by the constraint. This geometrically-constrained reasoning strategy successfully resolve the semantic-to-geometric gap, yielding a robust and verifiable reasoning pathway for spatial reasoning. Comprehensive experiments demonstrate that GCA achieves SOTA performance on multiple spatial reasoning benchmarks, surpassing existing training-based and tool-integrated methods by ~27%. Please see our homepage at https://gca-spatial-reasoning.github.io.
PDF412February 8, 2026