대규모 자기 개선 시연을 통한 목표 지향 언어 기반 내비게이션 학습

초록

목표 지향 언어 안내 탐색은 단계별 지시 없이도 에이전트가 미지의 환경에서 지정된 목표까지 탐색할 수 있는 강력한 탐사 능력을 요구합니다. 기존 방법들은 주로 최단 경로 궤적만을 활용하여 탐색 에이전트를 훈련시키는 데 있어 효과적인 탐사 사전 지식이 부족했습니다. 이러한 문제를 해결하기 위해, 우리는 자기 개선 데모(Self-Improving Demonstrations, SID)를 통한 목표 지향 언어 안내 탐색 학습 접근법을 제안합니다. 구체적으로, SID는 환경에서 샘플링된 최단 경로 데이터를 기반으로 초기 에이전트를 학습한 후, 이 에이전트를 활용하여 새로운 탐사 궤적을 생성합니다. 이러한 새로운 롤아웃은 더 강력한 탐사 전략을 가진 데모를 제공하여 더 나은 에이전트를 훈련시키고, 이는 다시 다음 훈련 단계를 위해 더 높은 품질의 에이전트 데모를 생성합니다. 우리는 이 반복적인 자기 개선 파이프라인이 새로운 환경에 쉽게 확장될 수 있으며, 결과적으로 생성된 데모가 다양한 언어 안내 탐색 작업 간에 전이될 수 있음을 보여줍니다. 이를 통해 다양한 목표 지향 탐색 작업에서 성능 한계를 높일 수 있습니다. 광범위한 실험을 통해 SID가 탐색 에이전트의 탐사 능력과 일반화 능력을 크게 향상시킴을 입증했습니다. 결과적으로, SID는 REVERIE, SOON을 포함한 목표 지향 언어 안내 탐색 작업에서 새로운 최첨단 성능을 달성했으며, 특히 SOON의 미검증 데이터셋에서 50.9%의 성공률을 기록하여 기존 최고 접근법을 13.9% 차이로 앞섰습니다.

English

Goal-oriented language-guided navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for training navigation agents. To address the above challenges, we present SID, a goal-oriented language-guided navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration strategies to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations can be transferred across a variety of language-guided navigation tasks, elevating the performance ceiling in diverse goal-oriented navigation tasks. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented language-guided navigation tasks, including REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.

대규모 자기 개선 시연을 통한 목표 지향 언어 기반 내비게이션 학습

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

초록

Support