大規模自我改進示範下的目標導向語言導航學習

摘要

面向目标的語言引導導航要求智能體具備強大的探索能力，以便在未知環境中無需逐步指令即可導航至指定目標。現有方法往往僅依賴最短路徑軌跡，缺乏有效的探索先驗來訓練導航智能體。為應對上述挑戰，我們提出了SID，一種帶有自我改進示範的面向目標語言引導導航學習方法。具體而言，SID首先從環境中採樣的最短路徑數據上學習初始智能體，隨後利用該智能體生成新穎的探索軌跡。這些新穎的探索軌跡提供了具有更強探索策略的示範，用於訓練更優的智能體，而該智能體又會產生更高質量的示範，供下一輪訓練使用。我們展示了這一迭代自我改進的流程能輕鬆適應新環境，且生成的示範可跨多種語言引導導航任務轉移，從而提升多樣化面向目標導航任務的性能上限。大量實驗表明，SID顯著增強了導航智能體的探索能力和泛化能力。最終的智能體在面向目標的語言引導導航任務（包括REVERIE、SOON）上取得了新的最優性能，特別是在SOON的未見驗證集上達到了50.9%的成功率，較先前領先方法提升了13.9%。

English

Goal-oriented language-guided navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for training navigation agents. To address the above challenges, we present SID, a goal-oriented language-guided navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration strategies to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations can be transferred across a variety of language-guided navigation tasks, elevating the performance ceiling in diverse goal-oriented navigation tasks. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented language-guided navigation tasks, including REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.

大規模自我改進示範下的目標導向語言導航學習

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

摘要

Support