확장과 정제: 언어 지시 로봇 기술 습득

초록

로봇 기술 습득을 위한 프레임워크를 제안한다. 이 프레임워크는 1) 언어 라벨이 부착된 로봇 데이터 생성의 효율적 확장과 2) 이러한 데이터를 강력한 다중 작업 언어 조건 시각-운동 정책으로 효과적으로 정제하는 것을 목표로 한다. 첫 번째 목표를 위해, 대형 언어 모델(LLM)을 사용하여 고수준 계획을 안내하고, 샘플링 기반 로봇 플래너(예: 모션 또는 그랩 샘플러)를 활용하여 다양하고 풍부한 조작 궤적을 생성한다. 데이터 수집 과정을 강화하기 위해, LLM은 각 작업의 성공 조건을 위한 코드 스니펫을 추론하여, 데이터 수집 과정에서 실패를 감지하고 재시도할 수 있도록 하며, 동시에 궤적에 성공/실패 라벨을 자동으로 부여한다. 두 번째 목표를 위해, 단일 작업 행동 복제 접근법인 확산 정책을 언어 조건을 포함한 다중 작업 설정으로 확장한다. 마지막으로, 장기적 행동, 상식적 추론, 도구 사용, 직관적 물리학을 테스트하기 위해 5개 영역에 걸친 18개 작업으로 구성된 새로운 다중 작업 벤치마크를 제안한다. 정제된 정책은 데이터 수집 정책에서 강력한 재시도 행동을 성공적으로 학습했으며, 5개 영역에서 평균 34.8%의 절대 성공률 향상을 달성했다. 벤치마크, 코드, 그리고 질적 결과는 웹사이트 https://www.cs.columbia.edu/~huy/scalingup/에서 확인할 수 있다.

English

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection policy, while improving absolute success rates by 34.8% on average across five domains. The benchmark, code, and qualitative results are on our website https://www.cs.columbia.edu/~huy/scalingup/

확장과 정제: 언어 지시 로봇 기술 습득

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

초록

Support