Ag2Manip: Het aanleren van nieuwe manipulatietechnieken met agent-agnostische visuele en actie-representaties

Samenvatting

Autonome robotsystemen die in staat zijn nieuwe manipulatie taken te leren, staan op het punt om industrieën van productie tot serviceautomatisering te transformeren. Moderne methoden (bijvoorbeeld VIP en R3M) kampen echter nog steeds met aanzienlijke obstakels, met name de domeinkloof tussen verschillende robotconfiguraties en de schaarste van succesvolle taakuitvoeringen binnen specifieke actieruimtes, wat leidt tot verkeerd uitgelijnde en ambiguë taakrepresentaties. Wij introduceren Ag2Manip (Agent-Agnostische representaties voor Manipulatie), een raamwerk gericht op het overwinnen van deze uitdagingen door middel van twee belangrijke innovaties: een nieuwe agent-agnostische visuele representatie afgeleid van menselijke manipulatievideo's, waarbij de specifieke kenmerken van de configuraties worden verhuld om de generaliseerbaarheid te vergroten; en een agent-agnostische actierepresentatie die de kinematica van een robot abstraheert naar een universele agentproxy, met nadruk op cruciale interacties tussen eindeffector en object. De empirische validatie van Ag2Manip over gesimuleerde benchmarks zoals FrankaKitchen, ManiSkill en PartManip toont een prestatieverbetering van 325%, bereikt zonder domeinspecifieke demonstraties. Ablatiestudies benadrukken de essentiële bijdragen van de visuele en actierepresentaties aan dit succes. Door onze evaluaties uit te breiden naar de echte wereld, verbetert Ag2Manip de slagingspercentages van imitatieleren significant van 50% naar 77,5%, wat de effectiviteit en generaliseerbaarheid ervan aantoont in zowel gesimuleerde als fysieke omgevingen.

English

Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task representations. We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting these challenges through two key innovations: a novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability; and an agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object. Ag2Manip's empirical validation across simulated benchmarks like FrankaKitchen, ManiSkill, and PartManip shows a 325% increase in performance, achieved without domain-specific demonstrations. Ablation studies underline the essential contributions of the visual and action representations to this success. Extending our evaluations to the real world, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across both simulated and physical environments.

Ag2Manip: Het aanleren van nieuwe manipulatietechnieken met agent-agnostische visuele en actie-representaties

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Samenvatting

Support