ChatPaper.aiChatPaper

IGL-Nav:面向图像目标导航的增量式3D高斯定位

IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

August 1, 2025
作者: Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu
cs.AI

摘要

以图像为目标的视觉导航是一个基础且富有挑战性的问题。传统方法要么依赖端到端的强化学习,要么采用基于模块化策略,以拓扑图或鸟瞰图作为记忆,这些方法无法充分建模探索的三维环境与目标图像之间的几何关系。为了高效且精确地在三维空间中定位目标图像,我们构建了基于可渲染三维高斯(3DGS)表示的导航系统。然而,由于3DGS优化的计算密集性以及六自由度相机姿态的大搜索空间,直接在智能体探索过程中利用3DGS进行图像定位效率极低。为此,我们提出了IGL-Nav,一种增量式三维高斯定位框架,用于实现高效且三维感知的图像目标导航。具体而言,我们随着新图像的到来,通过前馈单目预测逐步更新场景表示。随后,利用几何信息进行离散空间匹配,粗略定位目标,这一过程可等效于高效的三维卷积。当智能体接近目标时,最终通过可微分渲染优化求解精确的目标姿态。所提出的IGL-Nav在多种实验配置下均大幅超越现有最先进方法。它还能应对更具挑战性的自由视角图像目标设定,并可在现实世界的机器人平台上部署,使用手机以任意姿态捕捉目标图像。项目页面:https://gwxuan.github.io/IGL-Nav/。
English
Visual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we build our navigation system upon the renderable 3D gaussian (3DGS) representation. However, due to the computational intensity of 3DGS optimization and the large search space of 6-DoF camera pose, directly leveraging 3DGS for image localization during agent exploration process is prohibitively inefficient. To this end, we propose IGL-Nav, an Incremental 3D Gaussian Localization framework for efficient and 3D-aware image-goal navigation. Specifically, we incrementally update the scene representation as new images arrive with feed-forward monocular prediction. Then we coarsely localize the goal by leveraging the geometric information for discrete space matching, which can be equivalent to efficient 3D convolution. When the agent is close to the goal, we finally solve the fine target pose with optimization via differentiable rendering. The proposed IGL-Nav outperforms existing state-of-the-art methods by a large margin across diverse experimental configurations. It can also handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform using a cellphone to capture goal image at arbitrary pose. Project page: https://gwxuan.github.io/IGL-Nav/.
PDF42August 4, 2025