Ag2Manip:使用與代理人無關的視覺和動作表示學習新的操作技能
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
April 26, 2024
作者: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
cs.AI
摘要
具備學習新型態操作任務能力的自主機器人系統有望改變從製造業到服務自動化等各個行業。然而,現代方法(例如VIP和R3M)仍然面臨重大障礙,特別是機器人實體之間的領域差異以及特定動作空間內成功任務執行的稀疏性,導致任務表示不一致和模糊。我們介紹了Ag2Manip(用於操作的Agent-Agnostic表示),這是一個旨在克服這些挑戰的框架,通過兩個關鍵創新:一種源自人類操作視頻的新型Agent-Agnostic視覺表示,其中實體的具體細節被隱藏以增強泛化能力;以及一種抽象機器人運動學為通用Agent代理的Agent-Agnostic動作表示,強調末端執行器和物體之間的關鍵交互作用。Ag2Manip在模擬基準測試(如FrankaKitchen、ManiSkill和PartManip)中的實證驗證顯示,性能提高了325%,而無需領域特定示範。消融研究突顯了視覺和動作表示對此成功的重要貢獻。將我們的評估擴展到現實世界,Ag2Manip將模仿學習成功率從50%提高到77.5%,展示了其在模擬和實際環境中的有效性和泛化能力。
English
Autonomous robotic systems capable of learning novel manipulation tasks are
poised to transform industries from manufacturing to service automation.
However, modern methods (e.g., VIP and R3M) still face significant hurdles,
notably the domain gap among robotic embodiments and the sparsity of successful
task executions within specific action spaces, resulting in misaligned and
ambiguous task representations. We introduce Ag2Manip (Agent-Agnostic
representations for Manipulation), a framework aimed at surmounting these
challenges through two key innovations: a novel agent-agnostic visual
representation derived from human manipulation videos, with the specifics of
embodiments obscured to enhance generalizability; and an agent-agnostic action
representation abstracting a robot's kinematics to a universal agent proxy,
emphasizing crucial interactions between end-effector and object. Ag2Manip's
empirical validation across simulated benchmarks like FrankaKitchen, ManiSkill,
and PartManip shows a 325% increase in performance, achieved without
domain-specific demonstrations. Ablation studies underline the essential
contributions of the visual and action representations to this success.
Extending our evaluations to the real world, Ag2Manip significantly improves
imitation learning success rates from 50% to 77.5%, demonstrating its
effectiveness and generalizability across both simulated and physical
environments.Summary
AI-Generated Summary