アフォーダンスを考慮したロボットの器用な把持に向けて：人間に似た事前知識の活用

要旨

汎用的な物体把持が可能な器用なハンドは、汎用型エンボディドAIの開発において基本的な要素である。しかし、従来の手法は低レベルの把持安定性指標に限定されており、下流の操作において重要なアフォーダンスを考慮した位置決めや人間らしい姿勢を軽視してきた。これらの制約を解決するため、我々はAffordDexを提案する。これは、運動の事前知識と物体のアフォーダンスの両方を本質的に理解する普遍的把持ポリシーを学習する、2段階のトレーニングを特徴とする新しいフレームワークである。第1段階では、人間の手の動きの大規模なコーパスを用いて軌道模倣器を事前学習し、自然な動きに対する強力な事前知識を習得させる。第2段階では、残差モジュールを訓練し、これらの一般的な人間らしい動きを特定の物体インスタンスに適応させる。この精緻化は、機能的に不適切な接触領域を特定するNegative Affordance-aware Segmentation（NAA）モジュールと、最終的な視覚ベースのポリシーが高い成功率を達成することを保証する特権的な教師-生徒蒸留プロセスという2つのコンポーネントによって重要な指導を受ける。広範な実験により、AffordDexが普遍的な器用把持を達成するだけでなく、姿勢が非常に人間らしく、接触位置が機能的に適切であることが実証された。その結果、AffordDexは既知の物体、未知のインスタンス、さらには全く新しいカテゴリーにわたって、最先端のベースラインを大幅に上回る性能を示した。

English

A dexterous hand capable of generalizable grasping objects is fundamental for the development of general-purpose embodied AI. However, previous methods focus narrowly on low-level grasp stability metrics, neglecting affordance-aware positioning and human-like poses which are crucial for downstream manipulation. To address these limitations, we propose AffordDex, a novel framework with two-stage training that learns a universal grasping policy with an inherent understanding of both motion priors and object affordances. In the first stage, a trajectory imitator is pre-trained on a large corpus of human hand motions to instill a strong prior for natural movement. In the second stage, a residual module is trained to adapt these general human-like motions to specific object instances. This refinement is critically guided by two components: our Negative Affordance-aware Segmentation (NAA) module, which identifies functionally inappropriate contact regions, and a privileged teacher-student distillation process that ensures the final vision-based policy is highly successful. Extensive experiments demonstrate that AffordDex not only achieves universal dexterous grasping but also remains remarkably human-like in posture and functionally appropriate in contact location. As a result, AffordDex significantly outperforms state-of-the-art baselines across seen objects, unseen instances, and even entirely novel categories.

アフォーダンスを考慮したロボットの器用な把持に向けて：人間に似た事前知識の活用

Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors

要旨

Support