ChatPaper.aiChatPaper

空間推理的幾何約束智能體

Geometrically-Constrained Agent for Spatial Reasoning

November 27, 2025
作者: Zeren Chen, Xiaoya Lu, Zhijie Zheng, Pengrui Li, Lehan He, Yijin Zhou, Jing Shao, Bohan Zhuang, Lu Sheng
cs.AI

摘要

視覺語言模型(VLM)在空間推理中存在根本性的語義-幾何鴻溝:它們擅長定性語義推斷,但其推理過程在損耗性語義空間中運作,與高保真幾何空間存在錯位。現有範式均未能彌合此鴻溝——基於訓練的方法陷入「預言悖論」,從不完美的預言源學習有缺陷的空間邏輯;工具集成方法雖能約束最終計算,卻關鍵性地放任VLM的規劃過程不受約束,導致生成幾何謬誤的計劃。本研究提出幾何約束智能體(GCA),這種免訓練的能動範式通過引入形式化任務約束來解決該問題。具體而言,我們策略性地將VLM角色解耦為兩個階段:首先作為語義分析師,將用戶模糊查詢轉化為可驗證的形式化任務約束,明確定義參考系與目標;其次作為任務求解器,在約束定義的確定性邊界內嚴格生成並執行工具調用。這種幾何約束推理策略成功消除了語義-幾何鴻溝,為空間推理構建出魯棒且可驗證的推理路徑。綜合實驗表明,GCA在多個空間推理基準上達到頂尖性能,以約27%優勢超越現有訓練基與工具集成方法。詳見項目主頁:https://gca-spatial-reasoning.github.io。
English
Vision Language Models (VLMs) exhibit a fundamental semantic-to-geometric gap in spatial reasoning: they excel at qualitative semantic inference but their reasoning operates within a lossy semantic space, misaligned with high-fidelity geometry. Current paradigms fail to bridge this gap. Training-based methods suffer from an ``oracle paradox,'' learning flawed spatial logic from imperfect oracles. Tool-integrated methods constrain the final computation but critically leave the VLM's planning process unconstrained, resulting in geometrically flawed plans. In this work, we propose Geometrically-Constrained Agent (GCA), a training-free agentic paradigm that resolves this gap by introducing a formal task constraint. Specifically, we strategically decouples the VLM's role into two stages. First, acting as a semantic analyst, the VLM translates the user's ambiguous query into the formal, verifiable task constraint, which defines the reference frame and objective. Second, acting as a task solver, the VLM generates and executes tool calls strictly within the deterministic bounds defined by the constraint. This geometrically-constrained reasoning strategy successfully resolve the semantic-to-geometric gap, yielding a robust and verifiable reasoning pathway for spatial reasoning. Comprehensive experiments demonstrate that GCA achieves SOTA performance on multiple spatial reasoning benchmarks, surpassing existing training-based and tool-integrated methods by ~27%. Please see our homepage at https://gca-spatial-reasoning.github.io.
PDF412February 8, 2026