NavDP:基於特權信息引導的模擬到真實導航擴散策略學習
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
May 13, 2025
作者: Wenzhe Cai, Jiaqi Peng, Yuqiang Yang, Yujian Zhang, Meng Wei, Hanqing Wang, Yilun Chen, Tai Wang, Jiangmiao Pang
cs.AI
摘要
在動態開放世界環境中學習導航是機器人一項重要但具挑戰性的技能。大多數先前的方法依賴於精確的定位與地圖構建,或從昂貴的真實世界示範中學習。本文提出了一種名為導航擴散策略(NavDP)的端到端框架,該框架僅在模擬環境中訓練,並能零樣本遷移到不同實體在多樣化真實世界環境中。NavDP網絡的關鍵要素是結合了基於擴散的軌跡生成和用於軌跡選擇的評價函數,這些都僅基於由共享策略變壓器編碼的局部觀測令牌。利用模擬中全局環境的特權信息,我們擴展了高質量示範來訓練擴散策略,並通過對比負樣本制定評價函數目標。我們的示範生成方法每天可產生約2,500條軌跡/GPU,效率比真實世界數據收集高出20倍,並生成了一個包含1244個場景、總長363.2公里軌跡的大規模導航數據集。使用此模擬數據集訓練的NavDP,在四足、輪式和類人機器人在多樣化的室內外環境中均達到了最先進的性能,並展現出持續優異的泛化能力。此外,我們初步嘗試使用高斯濺射進行域內真實到模擬的微調,以進一步縮小模擬與真實之間的差距。實驗表明,添加此類真實到模擬的數據可將成功率提高30%,而不損害其泛化能力。
English
Learning navigation in dynamic open-world environments is an important yet
challenging skill for robots. Most previous methods rely on precise
localization and mapping or learn from expensive real-world demonstrations. In
this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end
framework trained solely in simulation and can zero-shot transfer to different
embodiments in diverse real-world environments. The key ingredient of NavDP's
network is the combination of diffusion-based trajectory generation and a
critic function for trajectory selection, which are conditioned on only local
observation tokens encoded from a shared policy transformer. Given the
privileged information of the global environment in simulation, we scale up the
demonstrations of good quality to train the diffusion policy and formulate the
critic value function targets with contrastive negative samples. Our
demonstration generation approach achieves about 2,500 trajectories/GPU per
day, 20times more efficient than real-world data collection, and results in
a large-scale navigation dataset with 363.2km trajectories across 1244 scenes.
Trained with this simulation dataset, NavDP achieves state-of-the-art
performance and consistently outstanding generalization capability on
quadruped, wheeled, and humanoid robots in diverse indoor and outdoor
environments. In addition, we present a preliminary attempt at using Gaussian
Splatting to make in-domain real-to-sim fine-tuning to further bridge the
sim-to-real gap. Experiments show that adding such real-to-sim data can improve
the success rate by 30\% without hurting its generalization capability.Summary
AI-Generated Summary