ChatPaper.aiChatPaper

UniGoal:邁向通用零樣本目標導向導航

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

March 13, 2025
作者: Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu
cs.AI

摘要

本文提出了一種通用的零樣本目標導航框架。現有的零樣本方法基於大型語言模型(LLM)構建特定任務的推理框架,這些方法在整體流程上差異較大,且難以泛化到不同類型的目標。為實現通用零樣本導航,我們提出了一種統一的圖表示方法,以整合不同類型的目標,包括物體類別、實例圖像和文本描述。同時,我們將智能體的觀測轉化為在線維護的場景圖。通過這種一致的場景與目標表示,相比純文本,我們保留了更多的結構信息,並能夠利用LLM進行顯式的基於圖的推理。具體而言,我們在每個時間點對場景圖和目標圖進行圖匹配,並根據不同的匹配狀態提出不同的策略來生成探索的長期目標。當零匹配時,智能體首先迭代搜索目標子圖;在部分匹配時,智能體利用座標投影和錨點對齊來推斷目標位置;最後,在完全匹配時應用場景圖校正和目標驗證。我們還引入了一種黑名單機制,以實現階段間的穩健切換。在多個基準上的大量實驗表明,我們的UniGoal在三個研究的導航任務上以單一模型實現了最先進的零樣本性能,甚至超越了特定任務的零樣本方法和監督式通用方法。
English
In this paper, we propose a general framework for universal zero-shot goal-oriented navigation. Existing zero-shot methods build inference framework upon large language models (LLM) for specific tasks, which differs a lot in overall pipeline and fails to generalize across different types of goal. Towards the aim of universal zero-shot navigation, we propose a uniform graph representation to unify different goals, including object category, instance image and text description. We also convert the observation of agent into an online maintained scene graph. With this consistent scene and goal representation, we preserve most structural information compared with pure text and are able to leverage LLM for explicit graph-based reasoning. Specifically, we conduct graph matching between the scene graph and goal graph at each time instant and propose different strategies to generate long-term goal of exploration according to different matching states. The agent first iteratively searches subgraph of goal when zero-matched. With partial matching, the agent then utilizes coordinate projection and anchor pair alignment to infer the goal location. Finally scene graph correction and goal verification are applied for perfect matching. We also present a blacklist mechanism to enable robust switch between stages. Extensive experiments on several benchmarks show that our UniGoal achieves state-of-the-art zero-shot performance on three studied navigation tasks with a single model, even outperforming task-specific zero-shot methods and supervised universal methods.

Summary

AI-Generated Summary

PDF62March 14, 2025