迈向具备环境感知能力的机器人灵巧抓取：融入类人先验知识

摘要

一只能够实现通用抓取的灵巧手，是发展通用型具身人工智能的基础。然而，现有方法大多局限于低层次的抓取稳定性指标，忽视了对于下游操作至关重要的功能感知定位与类人姿态。为解决这些局限，我们提出了AffordDex，一个采用两阶段训练的新颖框架，旨在学习一种兼具运动先验与物体功能理解的通用抓取策略。在第一阶段，通过在大规模人类手部运动数据上预训练轨迹模仿器，为自然运动注入强先验知识。第二阶段，训练一个残差模块，将这些通用的类人运动适配到特定物体实例上。这一精炼过程由两个关键组件引导：我们的负功能感知分割（NAA）模块，用于识别功能不恰当的接触区域；以及一个特权师生蒸馏过程，确保最终基于视觉的策略高度成功。大量实验表明，AffordDex不仅实现了通用的灵巧抓取，还在姿态上保持高度类人化，在接触位置上功能适宜。因此，AffordDex在已知物体、未见实例乃至全新类别上均显著超越了现有最先进的基线方法。

English

A dexterous hand capable of generalizable grasping objects is fundamental for the development of general-purpose embodied AI. However, previous methods focus narrowly on low-level grasp stability metrics, neglecting affordance-aware positioning and human-like poses which are crucial for downstream manipulation. To address these limitations, we propose AffordDex, a novel framework with two-stage training that learns a universal grasping policy with an inherent understanding of both motion priors and object affordances. In the first stage, a trajectory imitator is pre-trained on a large corpus of human hand motions to instill a strong prior for natural movement. In the second stage, a residual module is trained to adapt these general human-like motions to specific object instances. This refinement is critically guided by two components: our Negative Affordance-aware Segmentation (NAA) module, which identifies functionally inappropriate contact regions, and a privileged teacher-student distillation process that ensures the final vision-based policy is highly successful. Extensive experiments demonstrate that AffordDex not only achieves universal dexterous grasping but also remains remarkably human-like in posture and functionally appropriate in contact location. As a result, AffordDex significantly outperforms state-of-the-art baselines across seen objects, unseen instances, and even entirely novel categories.

迈向具备环境感知能力的机器人灵巧抓取：融入类人先验知识

Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

摘要

Support