ChatPaper.aiChatPaper

无处不在的学习:一种视觉通用框架用于强化学习

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

July 22, 2024
作者: Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu
cs.AI

摘要

我们能否赋予视觉动作机器人泛化能力,使其能够在各种开放式场景中运作?在本文中,我们提出了Maniwhere,这是一个专为视觉强化学习定制的通用框架,使训练过的机器人策略能够在多种视觉干扰类型的组合中实现泛化。具体来说,我们引入了一种融合了空间变换网络(STN)模块的多视图表示学习方法,以捕获不同视角之间的共享语义信息和对应关系。此外,我们采用基于课程的随机化和增强方法来稳定强化学习训练过程,并增强视觉泛化能力。为了展示Maniwhere的有效性,我们精心设计了8个任务,涵盖了复杂物体、双手操作和灵巧手部操作任务,展示了Maniwhere在3个硬件平台上强大的视觉泛化和从仿真到真实世界的迁移能力。我们的实验表明,Maniwhere明显优于现有的最先进方法。视频请访问https://gemcollector.github.io/maniwhere/。
English
Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose Maniwhere, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across 3 hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://gemcollector.github.io/maniwhere/.

Summary

AI-Generated Summary

PDF142November 28, 2024