ChatPaper.aiChatPaper

ObjectReact:学习面向物体的相对控制以实现视觉导航

ObjectReact: Learning Object-Relative Control for Visual Navigation

September 11, 2025
作者: Sourav Garg, Dustin Craggs, Vineeth Bhat, Lachlan Mares, Stefan Podgorski, Madhava Krishna, Feras Dayoub, Ian Reid
cs.AI

摘要

仅凭单一摄像头和拓扑地图进行视觉导航,近来已成为一种颇具吸引力的替代方案,相较于依赖额外传感器和三维地图的传统方法。这一进展通常通过“图像相对”的方式实现,即从当前观测图像与目标子图像对中估计控制指令。然而,世界在图像层面的表达存在局限,因为图像严格受限于智能体的姿态与具体形态。相比之下,作为地图属性的对象,则提供了一种不受具体形态和轨迹影响的世界表征。本研究中,我们提出了一种学习“对象相对”控制的新范式,展现出多项优势:a) 无需严格模仿过往经验即可探索新路径,b) 控制预测问题可与图像匹配问题解耦,c) 在跨具体形态部署时,面对训练-测试及地图构建-执行场景的差异,能实现高度不变性。我们提出了一种“相对”三维场景图形式的拓扑度量地图表示,用于获取更具信息量的对象级全局路径规划成本。我们训练了一个名为“ObjectReact”的局部控制器,直接基于高级“路径对象成本图”表示进行条件化,从而无需显式RGB输入。我们展示了在传感器高度变化及多项挑战空间理解能力的导航任务中(例如,沿地图轨迹反向导航),学习对象相对控制相较于图像相对控制的优势。此外,我们还证明了仅基于模拟的策略能够良好泛化至现实世界的室内环境。代码及补充材料可通过项目页面访问:https://object-react.github.io/
English
Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and subgoal image. However, image-level representations of the world have limitations because images are strictly tied to the agent's pose and embodiment. In contrast, objects, being a property of the map, offer an embodiment- and trajectory-invariant world representation. In this work, we present a new paradigm of learning "object-relative" control that exhibits several desirable characteristics: a) new routes can be traversed without strictly requiring to imitate prior experience, b) the control prediction problem can be decoupled from solving the image matching problem, and c) high invariance can be achieved in cross-embodiment deployment for variations across both training-testing and mapping-execution settings. We propose a topometric map representation in the form of a "relative" 3D scene graph, which is used to obtain more informative object-level global path planning costs. We train a local controller, dubbed "ObjectReact", conditioned directly on a high-level "WayObject Costmap" representation that eliminates the need for an explicit RGB input. We demonstrate the advantages of learning object-relative control over its image-relative counterpart across sensor height variations and multiple navigation tasks that challenge the underlying spatial understanding capability, e.g., navigating a map trajectory in the reverse direction. We further show that our sim-only policy is able to generalize well to real-world indoor environments. Code and supplementary material are accessible via project page: https://object-react.github.io/
PDF21September 12, 2025