로봇 학습을 위한 자가 적응 개선 루프

초록

전문가 시연 데이터로 학습된 비디오 생성 모델은 로봇 과제 해결을 위한 텍스트 조건 시각적 플래너로 활용되어 왔습니다. 그러나 보이지 않는 과제로의 일반화는 여전히 과제로 남아 있습니다. 웹 규모의 비디오 데이터셋과 같은 추가로 사전 수집된 오프라인 데이터 소스에서 학습된 사전 지식을 활용함으로써 일반화를 개선할 수 있지만, 경험의 시대에서는 스스로 수집한 행동으로부터 온라인 방식으로 지속적으로 개선할 수 있는 에이전트를 설계하는 것을 목표로 합니다. 이 연구에서는 도메인 내 비디오 모델이 인터넷 규모로 사전 학습된 비디오 모델과의 적응을 통해 수집된 자체 생성 궤적에 대해 반복적으로 업데이트하며, 지정된 관심 과제에 대한 성능을 꾸준히 개선하는 자기 적응 개선 루프(Self-Adapting Improvement Loop, SAIL)를 제안합니다. 우리는 SAIL을 MetaWorld 과제들뿐만 아니라 실제 로봇 팔에서의 두 가지 조작 과제에 적용했으며, 원래 도메인 내 비디오 모델 학습 중에는 보이지 않았던 새로운 과제들에 대해 여러 반복을 거치며 지속적으로 성능 개선이 이루어짐을 발견했습니다. 또한, SAIL이 자체 수집된 경험의 필터링 여부 및 방법, 그리고 초기 도메인 내 시연의 품질에 대해 놀라울 정도로 강건함을 발견했습니다. 요약된 인터넷 규모 데이터와의 적응 및 온라인 경험을 통한 학습을 통해, 우리는 자기 개선을 통해 새로운 로봇 과제를 해결하기 위한 고성능 비디오 모델을 반복적으로 부트스트랩하는 방법을 입증했습니다.

English

Video generative models trained on expert demonstrations have been utilized as performant text-conditioned visual planners for solving robotic tasks. However, generalization to unseen tasks remains a challenge. Whereas improved generalization may be facilitated by leveraging learned prior knowledge from additional pre-collected offline data sources, such as web-scale video datasets, in the era of experience we aim to design agents that can continuously improve in an online manner from self-collected behaviors. In this work we thus propose the Self-Adapting Improvement Loop (SAIL), where an in-domain video model iteratively updates itself on self-produced trajectories, collected through adaptation with an internet-scale pretrained video model, and steadily improves its performance for a specified task of interest. We apply SAIL to a diverse suite of MetaWorld tasks, as well as two manipulation tasks on a real robot arm, and find that performance improvements continuously emerge over multiple iterations for novel tasks initially unseen during original in-domain video model training. Furthermore, we discover that SAIL is surprisingly robust regarding if and how the self-collected experience is filtered, and the quality of the initial in-domain demonstrations. Through adaptation with summarized internet-scale data, and learning through online experience, we thus demonstrate a way to iteratively bootstrap a high-performance video model for solving novel robotic tasks through self-improvement.

로봇 학습을 위한 자가 적응 개선 루프

Self-Adapting Improvement Loops for Robotic Learning

초록

Support