无处不在的学习:一种视觉通用框架用于强化学习
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning
July 22, 2024
作者: Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu
cs.AI
摘要
我们能否赋予视觉动作机器人泛化能力,使其能够在各种开放式场景中运作?在本文中,我们提出了Maniwhere,这是一个专为视觉强化学习定制的通用框架,使训练过的机器人策略能够在多种视觉干扰类型的组合中实现泛化。具体来说,我们引入了一种融合了空间变换网络(STN)模块的多视图表示学习方法,以捕获不同视角之间的共享语义信息和对应关系。此外,我们采用基于课程的随机化和增强方法来稳定强化学习训练过程,并增强视觉泛化能力。为了展示Maniwhere的有效性,我们精心设计了8个任务,涵盖了复杂物体、双手操作和灵巧手部操作任务,展示了Maniwhere在3个硬件平台上强大的视觉泛化和从仿真到真实世界的迁移能力。我们的实验表明,Maniwhere明显优于现有的最先进方法。视频请访问https://gemcollector.github.io/maniwhere/。
English
Can we endow visuomotor robots with generalization capabilities to operate in
diverse open-world scenarios? In this paper, we propose Maniwhere, a
generalizable framework tailored for visual reinforcement learning, enabling
the trained robot policies to generalize across a combination of multiple
visual disturbance types. Specifically, we introduce a multi-view
representation learning approach fused with Spatial Transformer Network (STN)
module to capture shared semantic information and correspondences among
different viewpoints. In addition, we employ a curriculum-based randomization
and augmentation approach to stabilize the RL training process and strengthen
the visual generalization ability. To exhibit the effectiveness of Maniwhere,
we meticulously design 8 tasks encompassing articulate objects, bi-manual, and
dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual
generalization and sim2real transfer abilities across 3 hardware platforms. Our
experiments show that Maniwhere significantly outperforms existing
state-of-the-art methods. Videos are provided at
https://gemcollector.github.io/maniwhere/.Summary
AI-Generated Summary