ChatPaper.aiChatPaper

LoGoPlanner:基於局部定位的導航策略與具備度量感知的視覺幾何

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

December 22, 2025
作者: Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, Jiangmiao Pang
cs.AI

摘要

在非結構化環境中的軌跡規劃是移動機器人的基礎且具挑戰性的能力。傳統模組化流程存在感知、定位、建圖與規劃模組間的延遲與級聯誤差問題。近期端到端學習方法直接將原始視覺觀測映射為控制信號或軌跡,有望在開放世界場景中實現更高性能與效率。然而,多數現有端到端方法仍依賴獨立定位模組,需透過精確的感測器外參標定進行自身狀態估計,從而限制了跨實體與環境的泛化能力。我們提出LoGoPlanner,一種基於定位的端到端導航框架,透過以下方式解決這些限制:(1) 微調長時程視覺幾何骨幹網絡,使預測具備絕對公制尺度基礎,從而為精確定位提供隱式狀態估計;(2) 從歷史觀測重建周邊場景幾何,為可靠避障提供稠密細粒度環境感知;(3) 將策略建基於前述輔助任務引導的隱式幾何,從而減少誤差傳播。我們在仿真與真實場景中評估LoGoPlanner,其全端到端設計降低了累積誤差,而具公制感知的幾何記憶增強了規劃一致性與避障能力,相較於具理想定位的基準方法提升超過27.3%,並在跨實體與環境中展現強泛化性。程式碼與模型已公開於https://steinate.github.io/logoplanner.github.io/{專案頁面}。
English
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io/{project page}.
PDF182December 24, 2025