言語補正によるロボット操作のための汎用知識の蒸留と検索

要旨

今日のロボットポリシーは、新しい環境への汎化という課題に直面した際に、十分な性能を発揮できていない。人間による修正フィードバックは、そのような汎化を可能にするための重要なガイダンスである。しかし、オンラインでの人間の修正に適応し、そこから学習することは容易なことではない。ロボットは、新しい設定で適切な情報を検索し、介入率を低減するために、時間をかけて人間のフィードバックを記憶する必要があるだけでなく、高レベルの人間の選好に関する任意の修正から低レベルのスキルパラメータの調整に至るまで、フィードバックに対応できる必要がある。本研究では、任意の形式の言語フィードバックに対応し、修正から汎化可能な知識を蒸留し、テキストおよび視覚的な類似性に基づいて過去の経験を検索することで、新しい設定でのパフォーマンスを向上させる大規模言語モデル（LLM）ベースのシステム「Distillation and Retrieval of Online Corrections（DROC）」を提案する。DROCは、高レベルのタスクプランと低レベルのスキルプリミティブの両方における失敗に対処する一連のオンライン言語修正に対応できる。DROCは、オンライン修正のシーケンスから関連情報を知識ベースに効果的に蒸留し、新しいタスクやオブジェクトインスタンスを含む設定でその知識を検索することを実証する。DROCは、LLMを介して直接ロボットコードを生成する他の手法を上回り、最初のラウンドで必要な修正の総数の半分しか使用せず、2回の反復後にはほとんど修正を必要としない。さらなる結果、ビデオ、プロンプト、コードはhttps://sites.google.com/stanford.edu/drocで公開されている。

English

Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .

言語補正によるロボット操作のための汎用知識の蒸留と検索

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

要旨

Support