ChatPaper.aiChatPaper

MG-Nav:基於稀疏空間記憶的雙尺度視覺導航系統 (注:此處採用學術翻譯慣例,將"via"意譯為"基於"以符合中文論文標題表述習慣。保留"Nav"縮寫形式維持技術術語一致性,"雙尺度"準確對應"Dual-Scale"的視覺處理層級含義,"稀疏空間記憶"直譯"Sparse Spatial Memory"這一關鍵技術特徵。)

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

November 27, 2025
作者: Bo Wang, Jiehong Lin, Chenzhi Liu, Xinting Hu, Yifei Yu, Tianjia Liu, Zhongrui Wang, Xiaojuan Qi
cs.AI

摘要

我們提出MG-Nav(記憶引導導航),這是一個專為零樣本視覺導航設計的雙尺度框架,將全域記憶引導規劃與局部幾何增強控制相結合。其核心是稀疏空間記憶圖(SMG),這是一種緊湊的區域中心記憶模型,每個節點聚合多視角關鍵幀與物件語義,既能捕捉外觀與空間結構,又能保持視角多樣性。在全域層面,智能體基於SMG進行定位,並通過圖像-實例混合檢索規劃目標條件節點路徑,生成可達路標點序列以實現長時程引航。在局部層面,導航基礎策略以點目標模式執行這些路標點,並結合障礙物感知控制;當從最終節點導航至視覺目標時,則切換至圖像目標模式。為進一步增強視角對齊與目標識別,我們引入VGGT適配器——基於預訓練VGGT模型構建的輕量幾何模組,可將觀測特徵與目標特徵對齊至共享的3D感知空間。MG-Nav以不同頻率運行全域規劃與局部控制,並通過週期性重定位修正誤差。在HM3D實例-圖像-目標與MP3D圖像-目標基準測試中的實驗表明,MG-Nav實現了最先進的零樣本性能,並在動態重佈局與未見場景條件下保持穩健性。
English
We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its core is the Sparse Spatial Memory Graph (SMG), a compact, region-centric memory where each node aggregates multi-view keyframe and object semantics, capturing both appearance and spatial structure while preserving viewpoint diversity. At the global level, the agent is localized on SMG and a goal-conditioned node path is planned via an image-to-instance hybrid retrieval, producing a sequence of reachable waypoints for long-horizon guidance. At the local level, a navigation foundation policy executes these waypoints in point-goal mode with obstacle-aware control, and switches to image-goal mode when navigating from the final node towards the visual target. To further enhance viewpoint alignment and goal recognition, we introduce VGGT-adapter, a lightweight geometric module built on the pre-trained VGGT model, which aligns observation and goal features in a shared 3D-aware space. MG-Nav operates global planning and local control at different frequencies, using periodic re-localization to correct errors. Experiments on HM3D Instance-Image-Goal and MP3D Image-Goal benchmarks demonstrate that MG-Nav achieves state-of-the-art zero-shot performance and remains robust under dynamic rearrangements and unseen scene conditions.
PDF441December 4, 2025