通过大规模自我改进演示实现目标导向的语言引导导航学习
Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
September 29, 2025
作者: Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang
cs.AI
摘要
目标导向的语言引导导航要求智能体在未知环境中具备强大的探索能力,以便在没有逐步指令的情况下导航至指定目标。现有方法往往仅依赖最短路径轨迹,缺乏有效的探索先验来训练导航智能体。针对上述挑战,我们提出了SID,一种基于自我改进演示的目标导向语言引导导航学习方法。具体而言,SID首先从环境中采样的最短路径数据上训练初始智能体,随后利用该智能体生成新颖的探索轨迹。这些新颖的轨迹提供了具有更强探索策略的演示,用于训练更优的智能体,而该智能体又能为下一轮训练生成更高质量的演示。我们展示了这一迭代自我改进的流程能够轻松适应新环境,且生成的演示可跨多种语言引导导航任务迁移,从而提升多样化目标导向导航任务的性能上限。大量实验表明,SID显著增强了导航智能体的探索能力和泛化性能。最终,该智能体在包括REVERIE、SOON在内的目标导向语言引导导航任务中实现了新的最先进性能,特别是在SOON未见验证集上达到了50.9%的成功率,较之前领先方法提升了13.9%。
English
Goal-oriented language-guided navigation requires robust exploration
capabilities for agents to navigate to specified goals in unknown environments
without step-by-step instructions. Existing methods tend to exclusively utilize
shortest-path trajectories, lacking effective exploration priors for training
navigation agents. To address the above challenges, we present SID, a
goal-oriented language-guided navigation learning approach with Self-Improving
Demonstrations. Specifically, SID learns an initial agent on the shortest-path
data sampled from environments and then leverages this agent to generate novel
exploration trajectories. The novel rollouts provide demonstrations with
stronger exploration strategies to train a better agent, which in turn produces
higher-quality agent demonstrations for the next round of training. We show
that this iterative self-improving pipeline readily scales to new environments,
and the resulting demonstrations can be transferred across a variety of
language-guided navigation tasks, elevating the performance ceiling in diverse
goal-oriented navigation tasks. Extensive experiments demonstrate that SID
significantly boosts the exploration capabilities and generalization of
navigation agents. The resulting agent achieves new state-of-the-art
performance on goal-oriented language-guided navigation tasks, including
REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation
splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.