언어 교정을 통한 로봇 조작을 위한 일반화 가능한 지식의 추출 및 검색

초록

오늘날의 로봇 정책은 새로운 환경에 일반화해야 하는 과제에 직면했을 때 성능이 저조한 모습을 보입니다. 인간의 수정 피드백은 이러한 일반화를 가능하게 하는 중요한 지침 형태입니다. 그러나 온라인 인간 수정에 적응하고 이를 학습하는 것은 사소한 일이 아닙니다: 로봇은 시간이 지남에 따라 인간의 피드백을 기억하여 새로운 설정에서 적절한 정보를 검색하고 개입률을 줄여야 할 뿐만 아니라, 고수준의 인간 선호도에 대한 임의의 수정부터 저수준의 스킬 매개변수 조정에 이르기까지 다양한 피드백에 응답할 수 있어야 합니다. 본 연구에서는 임의 형태의 언어 피드백에 응답하고, 수정에서 일반화 가능한 지식을 추출하며, 텍스트 및 시각적 유사성을 기반으로 관련된 과거 경험을 검색하여 새로운 설정에서의 성능을 개선할 수 있는 대형 언어 모델(LLM) 기반 시스템인 DROC(Distillation and Retrieval of Online Corrections)를 제시합니다. DROC는 고수준 작업 계획과 저수준 스킬 기본 요소 모두에서의 실패를 해결하는 일련의 온라인 언어 수정에 응답할 수 있습니다. 우리는 DROC가 온라인 수정 시퀀스에서 관련 정보를 지식 기반에 효과적으로 추출하고, 새로운 작업 또는 객체 인스턴스가 있는 설정에서 해당 지식을 검색함을 보여줍니다. DROC는 LLM을 통해 직접 로봇 코드를 생성하는 다른 기술보다 첫 번째 라운드에서 필요한 총 수정 횟수의 절반만 사용하며, 두 번의 반복 후에는 거의 또는 전혀 수정이 필요하지 않습니다. 추가 결과, 비디오, 프롬프트 및 코드는 https://sites.google.com/stanford.edu/droc에서 확인할 수 있습니다.

English

Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .

언어 교정을 통한 로봇 조작을 위한 일반화 가능한 지식의 추출 및 검색

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

초록

Support