LoGoPlanner:基于定位感知的导航策略与度量视觉几何
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
December 22, 2025
作者: Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, Jiangmiao Pang
cs.AI
摘要
非结构化环境中的轨迹规划是移动机器人基础且关键的能力。传统模块化流程因感知、定位、建图与规划模块间的延迟及误差传递问题而受限。新兴的端到端学习方法将原始视觉观测直接映射为控制信号或轨迹,有望在开放世界场景中实现更高性能与效率。然而现有端到端方案大多仍依赖独立定位模块,需通过精确的传感器外参标定进行自身状态估计,这限制了其在不同机器人本体及环境间的泛化能力。我们提出LoGoPlanner——一种基于定位的端到端导航框架,通过以下方式突破上述局限:(1)微调长时程视觉几何骨干网络,使预测结果具有绝对度量尺度,从而为精确定位提供隐式状态估计;(2)从历史观测数据重建周边场景几何结构,为可靠避障提供稠密细粒度环境感知;(3)将策略学习建立在由上述辅助任务引导的隐式几何基础上,从而减少误差传播。我们在仿真与真实场景中评估LoGoPlanner,其全端到端设计有效降低累积误差,而具备度量感知的几何记忆模块则提升了规划一致性与避障能力,相较基于精确定位的基线方法性能提升超过27.3%,并在不同机器人本体与环境中展现出强大泛化能力。代码与模型已公开于项目页面https://steinate.github.io/logoplanner.github.io/。
English
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io/{project page}.