通过语言纠正提炼和检索机器人操作的通用知识

摘要

当今的机器人政策在面对泛化到新环境的挑战时表现不佳。人类的纠正反馈是一种至关重要的指导形式，可以帮助实现这种泛化。然而，适应并从在线人类纠正中学习是一项非常困难的任务：机器人不仅需要随时间记住人类反馈以便在新环境中检索正确信息并降低干预率，还需要能够对可能是关于高级人类偏好到低级技能参数调整的任意纠正做出回应。在这项工作中，我们提出了在线纠正的提取和检索（DROC）系统，这是一个基于大型语言模型（LLM）的系统，可以回应任意形式的语言反馈，从纠正中提炼出可泛化的知识，并基于文本和视觉相似性检索相关的过去经验，以提高在新环境中的性能。DROC能够回应一系列在线语言纠正，涉及高级任务计划和低级技能基元的失败。我们展示了DROC有效地从一系列在线纠正中提取相关信息，并在具有新任务或对象实例的环境中检索这些知识。DROC通过仅使用一半数量的第一轮所需纠正，在两次迭代后几乎不需要任何纠正，胜过了通过LLM直接生成机器人代码的其他技术。我们在https://sites.google.com/stanford.edu/droc 上展示了更多结果、视频、提示和代码。

English

Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .

通过语言纠正提炼和检索机器人操作的通用知识

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

摘要

Support