SayPlan：使用3D場景圖為基礎的大型語言模型，實現可擴展任務規劃

摘要

大型語言模型（LLMs）已經展示出在發展多樣任務的通用規劃代理方面取得了令人印象深刻的成果。然而，在廣闊、多層和多房間環境中實現這些計劃對機器人來說是一個重大挑戰。我們引入了SayPlan，這是一種可擴展的基於LLM的大規模任務規劃方法，用於機器人，並使用3D場景圖（3DSG）表示。為確保我們方法的可擴展性，我們：（1）利用3DSG的階層性質，使LLMs能夠從完整圖的較小、折疊表示中對任務相關子圖進行語義搜索；（2）通過整合傳統路徑規劃器來減少LLM的規劃視野；（3）引入一個迭代重新規劃流程，使用場景圖模擬器的反饋來完善初始計劃，糾正不可行的行動並避免規劃失敗。我們在兩個覆蓋多達3層、36個房間和140個物體的大規模環境上評估我們的方法，並展示我們的方法能夠從抽象和自然語言指令中為移動式機械手臂機器人執行的大規模、長視野任務計劃提供基礎。

English

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.

SayPlan：使用3D場景圖為基礎的大型語言模型，實現可擴展任務規劃

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

摘要

Support