Ag2Manip:使用与代理无关的视觉和动作表示学习新的操作技能
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
April 26, 2024
作者: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
cs.AI
摘要
能够学习新颖操纵任务的自主机器人系统有望改变从制造业到服务自动化的行业。然而,现代方法(例如,VIP和R3M)仍然面临重大障碍,特别是在机器人实体之间的领域差距和特定动作空间内成功任务执行的稀疏性方面,导致任务表示不一致和模糊。我们引入了Ag2Manip(用于操纵的Agent-Agnostic表示),这是一个旨在通过两个关键创新克服这些挑战的框架:一种新颖的与代理无关的视觉表示,源自人类操纵视频,其中实体的具体细节被隐藏以增强泛化能力;以及一个与代理无关的动作表示,将机器人的运动学抽象为通用代理,强调末端执行器和物体之间的关键交互。Ag2Manip在模拟基准测试中(如FrankaKitchen、ManiSkill和PartManip)的实证验证显示性能提高了325%,而无需领域特定的演示。消融研究强调了视觉和动作表示对这一成功的重要贡献。将我们的评估扩展到现实世界,Ag2Manip将模仿学习的成功率从50%提高到77.5%,展示了其在模拟和物理环境中的有效性和泛化能力。
English
Autonomous robotic systems capable of learning novel manipulation tasks are
poised to transform industries from manufacturing to service automation.
However, modern methods (e.g., VIP and R3M) still face significant hurdles,
notably the domain gap among robotic embodiments and the sparsity of successful
task executions within specific action spaces, resulting in misaligned and
ambiguous task representations. We introduce Ag2Manip (Agent-Agnostic
representations for Manipulation), a framework aimed at surmounting these
challenges through two key innovations: a novel agent-agnostic visual
representation derived from human manipulation videos, with the specifics of
embodiments obscured to enhance generalizability; and an agent-agnostic action
representation abstracting a robot's kinematics to a universal agent proxy,
emphasizing crucial interactions between end-effector and object. Ag2Manip's
empirical validation across simulated benchmarks like FrankaKitchen, ManiSkill,
and PartManip shows a 325% increase in performance, achieved without
domain-specific demonstrations. Ablation studies underline the essential
contributions of the visual and action representations to this success.
Extending our evaluations to the real world, Ag2Manip significantly improves
imitation learning success rates from 50% to 77.5%, demonstrating its
effectiveness and generalizability across both simulated and physical
environments.Summary
AI-Generated Summary