通过平衡序列建模的闭环长时程机器人规划

摘要

在努力使自主机器人采取行动的过程中，任务规划是一个重要挑战，需要将高级任务描述转化为长期行动序列。尽管语言模型代理取得了一些进展，但它们仍然容易出现规划错误，并且在规划能力方面存在局限性。为了解决机器人规划中的这些限制，我们提倡一种自我完善的方案，该方案通过迭代地完善草案计划直至达到平衡。值得注意的是，这一过程可以从分析角度进行端到端的优化，无需策划额外的验证器或奖励模型，使我们能够以简单的监督学习方式训练自我完善的规划器。同时，我们设计了一种嵌套平衡序列建模程序，用于高效的闭环规划，该程序整合了来自环境（或内部世界模型）的有用反馈。我们的方法在VirtualHome-Env基准测试中进行了评估，展现出更好的推理计算扩展性。代码可在https://github.com/Singularity0104/equilibrium-planner找到。

English

In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions into long-horizon action sequences. Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. Remarkably, this process can be optimized end-to-end from an analytical perspective without the need to curate additional verifiers or reward models, allowing us to train self-refining planners in a simple supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning that incorporates useful feedback from the environment (or an internal world model). Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference computation. Code is available at https://github.com/Singularity0104/equilibrium-planner.

通过平衡序列建模的闭环长时程机器人规划

Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling

摘要

Support