SayPlan：使用3D场景图为可扩展任务规划对大型语言模型进行基础化

摘要

大型语言模型（LLMs）已经展示出在开发多样任务的通用规划代理方面取得了令人印象深刻的成果。然而，在广阔、多层和多房间环境中落地这些计划对机器人来说是一个重大挑战。我们引入了SayPlan，这是一种可扩展的基于LLM的大规模任务规划方法，使用3D场景图（3DSG）表示。为了确保我们方法的可扩展性，我们：（1）利用3DSG的分层性质，允许LLMs从完整图的较小、折叠表示中进行语义搜索，以寻找与任务相关的子图；（2）通过集成经典路径规划器来减少LLM的规划视野；（3）引入一个迭代重新规划流水线，利用场景图模拟器的反馈来优化初始计划，纠正不可行的动作并避免规划失败。我们在涵盖多达3层、36个房间和140个物体的两个大规模环境上评估了我们的方法，并展示了我们的方法能够从抽象和自然语言指令中落实大规模、长视野任务计划，以便移动操作机器人执行。

English

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.

SayPlan：使用3D场景图为可扩展任务规划对大型语言模型进行基础化

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

摘要

Support