ロボティック学習のための自己適応型改善ループ

要旨

専門家のデモンストレーションに基づいて訓練されたビデオ生成モデルは、ロボットタスクを解決するための高性能なテキスト条件付き視覚プランナーとして利用されてきた。しかし、未見のタスクへの一般化は依然として課題である。一方で、ウェブ規模のビデオデータセットなどの追加の事前収集されたオフラインデータソースから学習された事前知識を活用することで、一般化の改善が促進される可能性がある。経験の時代において、我々は自己収集した行動からオンラインで継続的に改善できるエージェントを設計することを目指している。本研究では、ドメイン内のビデオモデルが自己生成した軌跡に基づいて反復的に更新され、インターネット規模の事前訓練済みビデオモデルとの適応を通じて収集された軌跡を利用し、指定された関心タスクのパフォーマンスを着実に向上させる「自己適応改善ループ（SAIL）」を提案する。SAILをMetaWorldタスクの多様なスイートおよび実ロボットアームでの2つの操作タスクに適用し、元のドメイン内ビデオモデル訓練中には未見であった新規タスクに対して、複数の反復を通じてパフォーマンスの改善が継続的に現れることを確認した。さらに、SAILが自己収集された経験のフィルタリングの有無や方法、および初期ドメイン内デモンストレーションの品質に関して驚くほど頑健であることを発見した。インターネット規模のデータを要約して適応し、オンライン経験を通じて学習することで、自己改善を通じて新規ロボットタスクを解決するための高性能ビデオモデルを反復的にブートストラップする方法を実証した。

English

Video generative models trained on expert demonstrations have been utilized as performant text-conditioned visual planners for solving robotic tasks. However, generalization to unseen tasks remains a challenge. Whereas improved generalization may be facilitated by leveraging learned prior knowledge from additional pre-collected offline data sources, such as web-scale video datasets, in the era of experience we aim to design agents that can continuously improve in an online manner from self-collected behaviors. In this work we thus propose the Self-Adapting Improvement Loop (SAIL), where an in-domain video model iteratively updates itself on self-produced trajectories, collected through adaptation with an internet-scale pretrained video model, and steadily improves its performance for a specified task of interest. We apply SAIL to a diverse suite of MetaWorld tasks, as well as two manipulation tasks on a real robot arm, and find that performance improvements continuously emerge over multiple iterations for novel tasks initially unseen during original in-domain video model training. Furthermore, we discover that SAIL is surprisingly robust regarding if and how the self-collected experience is filtered, and the quality of the initial in-domain demonstrations. Through adaptation with summarized internet-scale data, and learning through online experience, we thus demonstrate a way to iteratively bootstrap a high-performance video model for solving novel robotic tasks through self-improvement.

ロボティック学習のための自己適応型改善ループ

Self-Adapting Improvement Loops for Robotic Learning

要旨

Support