无处不在的学习：一种视觉通用框架用于强化学习

摘要

我们能否赋予视觉动作机器人泛化能力，使其能够在各种开放式场景中运作？在本文中，我们提出了Maniwhere，这是一个专为视觉强化学习定制的通用框架，使训练过的机器人策略能够在多种视觉干扰类型的组合中实现泛化。具体来说，我们引入了一种融合了空间变换网络（STN）模块的多视图表示学习方法，以捕获不同视角之间的共享语义信息和对应关系。此外，我们采用基于课程的随机化和增强方法来稳定强化学习训练过程，并增强视觉泛化能力。为了展示Maniwhere的有效性，我们精心设计了8个任务，涵盖了复杂物体、双手操作和灵巧手部操作任务，展示了Maniwhere在3个硬件平台上强大的视觉泛化和从仿真到真实世界的迁移能力。我们的实验表明，Maniwhere明显优于现有的最先进方法。视频请访问https://gemcollector.github.io/maniwhere/。

English

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose Maniwhere, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://gemcollector.github.io/maniwhere/.

无处不在的学习：一种视觉通用框架用于强化学习

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

摘要

Support