迈向具备人类先验知识的可操作感知机器人灵巧抓取

摘要

一隻能夠通用抓取物件的靈巧手，對於開發通用型具身人工智慧至關重要。然而，以往的方法過於專注於低層次的抓取穩定性指標，忽略了對下游操作至關重要的功能感知定位和類人姿態。為解決這些局限，我們提出了AffordDex，這是一個具有兩階段訓練的新框架，能夠學習一種內在理解運動先驗和物體功能性的通用抓取策略。在第一階段，軌跡模仿器在大量人手運動數據上進行預訓練，以注入自然運動的強先驗。在第二階段，一個殘差模塊被訓練來將這些通用的類人運動適應於特定物體實例。這一精煉過程由兩個關鍵組件引導：我們的負功能感知分割（NAA）模塊，它識別功能不當的接觸區域；以及一個特權師生蒸餾過程，確保最終基於視覺的策略高度成功。大量實驗表明，AffordDex不僅實現了通用的靈巧抓取，而且在姿態上極為類人，在接觸位置上功能適宜。因此，AffordDex在已見物體、未見實例甚至全新類別上均顯著優於最先進的基線方法。

English

A dexterous hand capable of generalizable grasping objects is fundamental for the development of general-purpose embodied AI. However, previous methods focus narrowly on low-level grasp stability metrics, neglecting affordance-aware positioning and human-like poses which are crucial for downstream manipulation. To address these limitations, we propose AffordDex, a novel framework with two-stage training that learns a universal grasping policy with an inherent understanding of both motion priors and object affordances. In the first stage, a trajectory imitator is pre-trained on a large corpus of human hand motions to instill a strong prior for natural movement. In the second stage, a residual module is trained to adapt these general human-like motions to specific object instances. This refinement is critically guided by two components: our Negative Affordance-aware Segmentation (NAA) module, which identifies functionally inappropriate contact regions, and a privileged teacher-student distillation process that ensures the final vision-based policy is highly successful. Extensive experiments demonstrate that AffordDex not only achieves universal dexterous grasping but also remains remarkably human-like in posture and functionally appropriate in contact location. As a result, AffordDex significantly outperforms state-of-the-art baselines across seen objects, unseen instances, and even entirely novel categories.

迈向具备人类先验知识的可操作感知机器人灵巧抓取

Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

摘要

Support