시각적 로봇 조작을 위한 모방 학습에서 일반화 격차 분해

초록

시각적 로봇 조작에서 모방 학습의 일반화를 어렵게 만드는 요인은 무엇인가? 이 질문은 표면적으로 접근하기 어려운 문제이지만, 로봇의 관점에서 환경은 종종 조명 조건이나 카메라 배치와 같은 다양한 변동 요소로 분해될 수 있다. 경험적으로, 이러한 요소 중 일부에 대한 일반화는 다른 요소들보다 더 큰 장애물로 나타났지만, 기존 연구는 각 요소가 일반화 격차에 기여하는 정도를 정확히 밝히지 못했다. 이 질문에 대한 답을 찾기 위해, 우리는 시뮬레이션과 실제 로봇에서 언어 조건부 조작 작업을 통해 모방 학습 정책을 연구하여 다양한 (집합의) 요소에 대한 일반화의 어려움을 정량화한다. 또한, 일반화의 더 통제된 평가를 용이하게 하기 위해 11개의 변동 요소를 가진 19개 작업의 새로운 시뮬레이션 벤치마크를 설계한다. 우리의 연구를 통해, 시뮬레이션과 실제 로봇 설정에서 일관되게 나타나는 일반화 난이도에 기반한 요소들의 순서를 결정한다.

English

What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We also design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup.

시각적 로봇 조작을 위한 모방 학습에서 일반화 격차 분해

Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation

초록

Support