SpatialEvo:透過確定性幾何環境實現自我演化的空間智能
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
April 15, 2026
作者: Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng, Hongxing Li, Zixuan Wang, Yuhong Dai, Haodong Li, Jia Wang, Yukang Shi, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
cs.AI
摘要
三維場景的空間推理是具身智能的核心能力,然而幾何標註的成本持續制約著模型的持續改進。自我演化範式雖提供可行路徑,但其依賴模型共識構建偽標籤的機制會導致訓練過程強化而非修正模型自身的幾何誤差。我們發現三維空間推理獨有的特性可突破此限制:真實標註是底層幾何的確定性產物,可直接從點雲和相機位姿精確計算而得,無需模型介入。基於此洞見,我們提出 SpatialEvo——以確定性幾何環境(DGE)為核心的三維空間推理自我演化框架。DGE 將 16 類空間推理任務形式化為明確定義的幾何驗證規則,把未標註三維場景轉化為零噪聲的交互式驗證器,以客觀物理反饋取代模型共識。單一共享參數策略在 DGE 約束下協同演化提問者與求解者雙重角色:提問者基於場景觀測生成符合物理規律的空間問題,求解者則參照 DGE 驗證的真實標註推導精確答案。任務自適應調度器內生性地將訓練聚焦於模型最薄弱環節,無需人工設計即可生成動態課程。在九個基準測試上的實驗表明,SpatialEvo 在 3B 和 7B 規模下均取得最高平均分,在空間推理基準上實現持續提升,且未損害通用視覺理解能力。
English
Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cost of geometric annotation. The self-evolving paradigm offers a promising path, but its reliance on model consensus to construct pseudo-labels causes training to reinforce rather than correct the model's own geometric errors. We identify a property unique to 3D spatial reasoning that circumvents this limitation: ground truth is a deterministic consequence of the underlying geometry, computable exactly from point clouds and camera poses without any model involvement. Building on this insight, we present SpatialEvo, a self-evolving framework for 3D spatial reasoning, centered on the Deterministic Geometric Environment (DGE). The DGE formalizes 16 spatial reasoning task categories under explicit geometric validation rules and converts unannotated 3D scenes into zero-noise interactive oracles, replacing model consensus with objective physical feedback. A single shared-parameter policy co-evolves across questioner and solver roles under DGE constraints: the questioner generates physically valid spatial questions grounded in scene observations, while the solver derives precise answers against DGE-verified ground truth. A task-adaptive scheduler endogenously concentrates training on the model's weakest categories, producing a dynamic curriculum without manual design. Experiments across nine benchmarks demonstrate that SpatialEvo achieves the highest average score at both 3B and 7B scales, with consistent gains on spatial reasoning benchmarks and no degradation on general visual understanding.