NavDP:基于特权信息引导的仿真到现实导航扩散策略学习
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
May 13, 2025
作者: Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang, Jiangmiao Pang
cs.AI
摘要
在动态开放世界中学习导航是机器人一项重要且具有挑战性的技能。以往大多数方法依赖于精确定位与建图,或从昂贵的真实世界演示中学习。本文提出了一种名为导航扩散策略(NavDP)的端到端框架,该框架仅在仿真环境中训练,并能零样本迁移到不同形态的机器人在多样化的真实世界环境中。NavDP网络的核心在于结合了基于扩散的轨迹生成和用于轨迹选择的评价函数,这两者仅基于共享策略变换器编码的局部观测标记进行条件化。利用仿真中全局环境的特权信息,我们大规模生成了高质量的演示来训练扩散策略,并通过对比负样本构建评价函数的目标值。我们的演示生成方法实现了约2500条轨迹/GPU每天,效率是真实世界数据采集的20倍,最终构建了一个包含1244个场景、总长363.2公里的大规模导航数据集。使用该仿真数据集训练后,NavDP在四足、轮式和类人机器人上,在多样化的室内外环境中均达到了最先进的性能,并展现出卓越的泛化能力。此外,我们初步尝试利用高斯溅射技术进行领域内真实到仿真的微调,以进一步缩小仿真与现实的差距。实验表明,加入此类真实到仿真的数据可将成功率提升30%,且不影响其泛化能力。
English
Learning navigation in dynamic open-world environments is an important yet
challenging skill for robots. Most previous methods rely on precise
localization and mapping or learn from expensive real-world demonstrations. In
this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end
framework trained solely in simulation and can zero-shot transfer to different
embodiments in diverse real-world environments. The key ingredient of NavDP's
network is the combination of diffusion-based trajectory generation and a
critic function for trajectory selection, which are conditioned on only local
observation tokens encoded from a shared policy transformer. Given the
privileged information of the global environment in simulation, we scale up the
demonstrations of good quality to train the diffusion policy and formulate the
critic value function targets with contrastive negative samples. Our
demonstration generation approach achieves about 2,500 trajectories/GPU per
day, 20times more efficient than real-world data collection, and results in
a large-scale navigation dataset with 363.2km trajectories across 1244 scenes.
Trained with this simulation dataset, NavDP achieves state-of-the-art
performance and consistently outstanding generalization capability on
quadruped, wheeled, and humanoid robots in diverse indoor and outdoor
environments. In addition, we present a preliminary attempt at using Gaussian
Splatting to make in-domain real-to-sim fine-tuning to further bridge the
sim-to-real gap. Experiments show that adding such real-to-sim data can improve
the success rate by 30\% without hurting its generalization capability.Summary
AI-Generated Summary