로봇 유틸리티 모델: 새로운 환경에서의 제로샷 배포를 위한 일반 정책

초록

로봇 모델들, 특히 대량의 데이터로 훈련된 모델들은 최근 다양한 실제 세계 조작 및 탐색 능력을 보여주고 있다. 여러 독립적인 노력들이 환경에서 충분한 훈련 데이터가 주어지면 로봇 정책이 해당 환경의 변화에 대해 일반화할 수 있음을 보여주었다. 그러나 새로운 환경에 대해 로봇 모델을 세밀하게 조정해야 하는 것은 언어나 비전 모델과는 대조적이며, 오픈 월드 문제에 대해 제로샷으로 배포될 수 있는 모델들과는 대조적이다. 본 연구에서는 새로운 환경에 대해 세부 조정 없이 일반화할 수 있는 제로샷 로봇 정책을 훈련하고 배포하는 RUMs(Robot Utility Models)라는 프레임워크를 제시한다. RUMs를 효율적으로 생성하기 위해 우리는 이동 조작 작업을 위한 데이터를 신속하게 수집하고, 다중 모달 모방 학습을 통해 해당 데이터를 정책에 통합하며, 저렴한 상용 로봇인 Hello Robot Stretch에서 정책을 장치에 배포하고 다시 시도하기 위해 외부 mLLM 확인기를 사용한다. 우리는 캐비닛 문을 열기, 서랍을 열기, 냅킨 집기, 종이 봉지 집기 및 넘어진 물체 재배치를 위한 다섯 가지 유틸리티 모델을 훈련시켰다. 우리의 시스템은 평균적으로 보이지 않는 새로운 환경에서 보이지 않는 물체와 상호 작용하여 90%의 성공률을 달성한다. 더불어 유틸리티 모델들은 추가 데이터, 훈련 또는 세부 조정 없이 다른 로봇 및 카메라 설정에서도 성공할 수 있다. 우리의 교훈 중 주요한 것은 훈련 알고리즘과 정책 클래스보다 훈련 데이터의 중요성, 데이터 스케일링에 대한 안내, 다양하면서도 고품질의 데모가 필요하다는 점, 그리고 개별 환경에서 성능을 향상시키기 위한 로봇 내부 조사 및 재시도를 위한 요령이다. 우리의 코드, 데이터, 모델, 하드웨어 디자인, 실험 및 배포 비디오는 모두 오픈 소스로 제공되며 프로젝트 웹사이트에서 확인할 수 있다: https://robotutilitymodels.com

English

Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. Our code, data, models, hardware designs, as well as our experiment and deployment videos are open sourced and can be found on our project website: https://robotutilitymodels.com

로봇 유틸리티 모델: 새로운 환경에서의 제로샷 배포를 위한 일반 정책

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

초록

Support