迈向具备环境感知能力的机器人灵巧抓取:融入类人先验知识
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors
August 12, 2025
作者: Haoyu Zhao, Linghao Zhuang, Xingyue Zhao, Cheng Zeng, Haoran Xu, Yuming Jiang, Jun Cen, Kexiang Wang, Jiayan Guo, Siteng Huang, Xin Li, Deli Zhao, Hua Zou
cs.AI
摘要
一只能够实现通用抓取的灵巧手,是发展通用型具身人工智能的基础。然而,现有方法大多局限于低层次的抓取稳定性指标,忽视了对于下游操作至关重要的功能感知定位与类人姿态。为解决这些局限,我们提出了AffordDex,一个采用两阶段训练的新颖框架,旨在学习一种兼具运动先验与物体功能理解的通用抓取策略。在第一阶段,通过在大规模人类手部运动数据上预训练轨迹模仿器,为自然运动注入强先验知识。第二阶段,训练一个残差模块,将这些通用的类人运动适配到特定物体实例上。这一精炼过程由两个关键组件引导:我们的负功能感知分割(NAA)模块,用于识别功能不恰当的接触区域;以及一个特权师生蒸馏过程,确保最终基于视觉的策略高度成功。大量实验表明,AffordDex不仅实现了通用的灵巧抓取,还在姿态上保持高度类人化,在接触位置上功能适宜。因此,AffordDex在已知物体、未见实例乃至全新类别上均显著超越了现有最先进的基线方法。
English
A dexterous hand capable of generalizable grasping objects is fundamental for
the development of general-purpose embodied AI. However, previous methods focus
narrowly on low-level grasp stability metrics, neglecting affordance-aware
positioning and human-like poses which are crucial for downstream manipulation.
To address these limitations, we propose AffordDex, a novel framework with
two-stage training that learns a universal grasping policy with an inherent
understanding of both motion priors and object affordances. In the first stage,
a trajectory imitator is pre-trained on a large corpus of human hand motions to
instill a strong prior for natural movement. In the second stage, a residual
module is trained to adapt these general human-like motions to specific object
instances. This refinement is critically guided by two components: our Negative
Affordance-aware Segmentation (NAA) module, which identifies functionally
inappropriate contact regions, and a privileged teacher-student distillation
process that ensures the final vision-based policy is highly successful.
Extensive experiments demonstrate that AffordDex not only achieves universal
dexterous grasping but also remains remarkably human-like in posture and
functionally appropriate in contact location. As a result, AffordDex
significantly outperforms state-of-the-art baselines across seen objects,
unseen instances, and even entirely novel categories.