在视觉机器人操作的模仿学习中分解泛化差距

摘要

在视觉机器人操作中，模仿学习的泛化为何如此困难？这个问题表面上很难解决，但从机器人的视角来看，环境往往可以分解为可数的变化因素，比如光照条件或摄像头的位置。从经验上看，对其中一些因素的泛化比其他因素更具挑战性，但现有研究对每个因素对泛化差距的贡献程度几乎没有提供明确的线索。为了回答这个问题，我们研究了模拟中的模仿学习策略以及在真实机器人上进行了基于语言的操作任务，以量化对不同（组合的）因素的泛化难度。我们还设计了一个新的模拟基准测试，包括19个任务和11个变化因素，以促进更可控的泛化评估。通过我们的研究，我们确定了一个基于泛化难度的因素排序，这个排序在模拟和我们的真实机器人设置中是一致的。

English

What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We also design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup.

在视觉机器人操作的模仿学习中分解泛化差距

Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation

摘要

Support