迈向具备人类先验知识的可操作感知机器人灵巧抓取
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors
August 12, 2025
作者: Haoyu Zhao, Linghao Zhuang, Xingyue Zhao, Cheng Zeng, Haoran Xu, Yuming Jiang, Jun Cen, Kexiang Wang, Jiayan Guo, Siteng Huang, Xin Li, Deli Zhao, Hua Zou
cs.AI
摘要
一隻能夠通用抓取物件的靈巧手,對於開發通用型具身人工智慧至關重要。然而,以往的方法過於專注於低層次的抓取穩定性指標,忽略了對下游操作至關重要的功能感知定位和類人姿態。為解決這些局限,我們提出了AffordDex,這是一個具有兩階段訓練的新框架,能夠學習一種內在理解運動先驗和物體功能性的通用抓取策略。在第一階段,軌跡模仿器在大量人手運動數據上進行預訓練,以注入自然運動的強先驗。在第二階段,一個殘差模塊被訓練來將這些通用的類人運動適應於特定物體實例。這一精煉過程由兩個關鍵組件引導:我們的負功能感知分割(NAA)模塊,它識別功能不當的接觸區域;以及一個特權師生蒸餾過程,確保最終基於視覺的策略高度成功。大量實驗表明,AffordDex不僅實現了通用的靈巧抓取,而且在姿態上極為類人,在接觸位置上功能適宜。因此,AffordDex在已見物體、未見實例甚至全新類別上均顯著優於最先進的基線方法。
English
A dexterous hand capable of generalizable grasping objects is fundamental for
the development of general-purpose embodied AI. However, previous methods focus
narrowly on low-level grasp stability metrics, neglecting affordance-aware
positioning and human-like poses which are crucial for downstream manipulation.
To address these limitations, we propose AffordDex, a novel framework with
two-stage training that learns a universal grasping policy with an inherent
understanding of both motion priors and object affordances. In the first stage,
a trajectory imitator is pre-trained on a large corpus of human hand motions to
instill a strong prior for natural movement. In the second stage, a residual
module is trained to adapt these general human-like motions to specific object
instances. This refinement is critically guided by two components: our Negative
Affordance-aware Segmentation (NAA) module, which identifies functionally
inappropriate contact regions, and a privileged teacher-student distillation
process that ensures the final vision-based policy is highly successful.
Extensive experiments demonstrate that AffordDex not only achieves universal
dexterous grasping but also remains remarkably human-like in posture and
functionally appropriate in contact location. As a result, AffordDex
significantly outperforms state-of-the-art baselines across seen objects,
unseen instances, and even entirely novel categories.