UltraDexGrasp：合成データを用いた二腕ロボットのための普遍的な巧緻把持の学習

要旨

把持動作は、ロボットが物理世界と相互作用するための基本的な能力である。人間は両手を備え、物体の形状・サイズ・重量に応じて適切な把持戦略を自律的に選択し、頑健な把持とその後の操作を実現している。一方、現在のロボット把持技術は依然として限界があり、特に多戦略環境において顕著である。平行グリッパーや片手把持には多大な研究努力が払われてきたが、両手ロボットのための巧緻把持は未開拓のままであり、データ不足が主要なボトルネックとなっている。外力に耐え得る物理的に妥当で幾何学的に適合した把持を実現することは、重大な課題である。これらの問題に対処するため、我々は両手ロボットのための汎用巧緻把持フレームワーク「UltraDexGrasp」を提案する。提案するデータ生成パイプラインは、最適化ベースの把持合成と計画ベースの実証生成を統合し、複数の把持戦略にわたる高品質で多様な軌道を生成する。このフレームワークを用いて、1,000の物体にわたる2,000万フレームからなる大規模多戦略把持データセット「UltraDexGrasp-20M」を構築した。UltraDexGrasp-20Mに基づき、点群を入力とし、単方向アテンションによりシーン特徴を集約し、制御コマンドを予測する、簡潔かつ効果的な把持ポリシーをさらに開発した。合成データのみで訓練されたこのポリシーは、頑健なゼロショットSim-to-Real転移を達成し、様々な形状・サイズ・重量の新規物体で一貫して成功し、実世界の汎用巧緻把持において81.2%の平均成功率を達成した。両手ロボットによる把持研究の促進のため、データ生成パイプラインをhttps://github.com/InternRobotics/UltraDexGrasp で公開する。

English

Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.

UltraDexGrasp：合成データを用いた二腕ロボットのための普遍的な巧緻把持の学習

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

要旨

Support