관찰과 상호작용을 통한 계획 수립

초록

관찰 학습은 에이전트가 수행된 과업의 관찰만을 참조하여 과업 수행 방법을 학습하는 것을 요구합니다. 본 연구는 손설계된 보상과 데모스트레이터의 행동에 대한 접근이 보장되지 않는 현실 세계 로봇 학습에서의 동등한 설정을 탐구합니다. 이러한 데이터 제약 환경을 해결하기 위해, 본 연구는 관찰과 상호작용만으로 세계 모델링을 위한 계획 기반 역강화학습(IRL) 알고리즘을 제시합니다. 실제 세계에서 전적으로 수행된 실험 결과, 이 패러다임이 사전 지식, 사전 훈련, 또는 과업 관찰 이상의 어떠한 데이터도 가정하지 않은 상태에서 1시간 이내에 이미지 기반 조작 과업을 처음부터 학습하는 데 효과적임을 입증합니다. 더 나아가, 본 연구는 학습된 세계 모델 표현이 실제 세계에서 처음부터 온라인 전이 학습이 가능함을 보여줍니다. 보다 제한적인 가정을 갖는 IRL, RL, 행동 복제(BC)를 포함한 기존 접근법과 비교하여, 제안된 접근법은 현저히 높은 샘플 효율성과 성공률을 입증함으로써 관찰과 상호작임을 통한 온라인 세계 모델링 및 계획의 실용적인 발전 경로를 제시합니다. 동영상 및 추가 정보: https://uwrobotlearning.github.io/mpail2/.

English

Observational learning requires an agent to learn to perform a task by referencing only observations of the performed task. This work investigates the equivalent setting in real-world robot learning where access to hand-designed rewards and demonstrator actions are not assumed. To address this data-constrained setting, this work presents a planning-based Inverse Reinforcement Learning (IRL) algorithm for world modeling from observation and interaction alone. Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, the proposed approach demonstrates significantly greater sample efficiency and success rates, enabling a practical path forward for online world modeling and planning from observation and interaction. Videos and more at: https://uwrobotlearning.github.io/mpail2/.

관찰과 상호작용을 통한 계획 수립

Planning from Observation and Interaction

초록

Support