机器人学习中的自适应改进循环

摘要

基于专家示范训练的视频生成模型已被用作高性能的文本条件视觉规划器，用于解决机器人任务。然而，泛化至未见过的任务仍是一大挑战。尽管通过利用从额外预收集的离线数据源（如网络规模视频数据集）中习得的先验知识可能促进泛化能力的提升，但在经验时代，我们旨在设计能够通过自我收集行为在线持续改进的智能体。因此，在本研究中，我们提出了自我适应改进循环（SAIL），其中域内视频模型通过自我产生的轨迹迭代更新，这些轨迹是通过与互联网规模预训练视频模型的适应收集而来，并稳步提升其在指定感兴趣任务上的表现。我们将SAIL应用于一系列多样化的MetaWorld任务以及一个真实机器人手臂上的两项操作任务，发现对于最初在域内视频模型训练期间未见的新任务，经过多次迭代后性能持续提升。此外，我们发现SAIL在自我收集经验是否及如何被过滤，以及初始域内示范的质量方面表现出惊人的鲁棒性。通过总结互联网规模数据的适应和在线经验的学习，我们展示了一种通过自我改进迭代引导高性能视频模型解决新颖机器人任务的方法。

English

Video generative models trained on expert demonstrations have been utilized as performant text-conditioned visual planners for solving robotic tasks. However, generalization to unseen tasks remains a challenge. Whereas improved generalization may be facilitated by leveraging learned prior knowledge from additional pre-collected offline data sources, such as web-scale video datasets, in the era of experience we aim to design agents that can continuously improve in an online manner from self-collected behaviors. In this work we thus propose the Self-Adapting Improvement Loop (SAIL), where an in-domain video model iteratively updates itself on self-produced trajectories, collected through adaptation with an internet-scale pretrained video model, and steadily improves its performance for a specified task of interest. We apply SAIL to a diverse suite of MetaWorld tasks, as well as two manipulation tasks on a real robot arm, and find that performance improvements continuously emerge over multiple iterations for novel tasks initially unseen during original in-domain video model training. Furthermore, we discover that SAIL is surprisingly robust regarding if and how the self-collected experience is filtered, and the quality of the initial in-domain demonstrations. Through adaptation with summarized internet-scale data, and learning through online experience, we thus demonstrate a way to iteratively bootstrap a high-performance video model for solving novel robotic tasks through self-improvement.

机器人学习中的自适应改进循环

Self-Adapting Improvement Loops for Robotic Learning

摘要

Support