視覚的ロボット操作における模倣学習の汎化ギャップの分解

要旨

視覚的ロボット操作における模倣学習の一般化を困難にする要因は何か？この問いに直接取り組むことは難しいが、ロボットの視点から見た環境は、照明条件やカメラの配置など、列挙可能な変動要因に分解できることが多い。経験的には、これらの要因の一部への一般化が他の要因よりも大きな障害となっているが、既存の研究では各要因が一般化ギャップにどの程度寄与しているかについてほとんど明らかにされていない。この問いに対する答えを探るため、シミュレーションおよび実ロボットを用いた言語条件付き操作タスクにおいて、模倣学習ポリシーを研究し、異なる（組み合わせの）要因への一般化の難しさを定量化する。また、より制御された一般化評価を容易にするため、11の変動要因を持つ19のタスクからなる新しいシミュレーションベンチマークを設計する。我々の研究から、シミュレーションと実ロボットセットアップの両方で一貫した、一般化の難易度に基づく要因の順序を決定する。

English

What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We also design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup.

視覚的ロボット操作における模倣学習の汎化ギャップの分解

Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation

要旨

Support