UltraDexGrasp:基于合成数据的双手机器人通用灵巧抓取学习
UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data
March 5, 2026
作者: Sizhe Yang, Yiman Xie, Zhixuan Liang, Yang Tian, Jia Zeng, Dahua Lin, Jiangmiao Pang
cs.AI
摘要
抓取是机器人实现物理世界交互的基础能力。人类凭借双手能够根据物体形状、尺寸及重量自主选择适宜的抓取策略,实现稳定抓取与后续操作。相比之下,当前机器人抓取技术仍存在局限,尤其在多策略场景下表现尤为明显。尽管针对平行夹爪和单手机器人的研究已取得显著进展,但双手灵巧抓取领域仍探索不足,其中数据匮乏是主要瓶颈。要实现能够承受外部力矩、符合物理规律与几何适配的抓取方案面临重大挑战。为此,我们提出UltraDexGrasp——一种面向双手机器人的通用灵巧抓取框架。该数据生成管道将基于优化的抓取合成与基于规划的示范生成相结合,产出跨多种抓取策略的高质量多样化轨迹。基于此框架,我们构建了UltraDexGrasp-20M大规模多策略抓取数据集,涵盖1,000个物体的2,000万帧数据。以此数据集为基础,我们进一步开发了以点云为输入的简洁高效抓取策略:通过单向注意力聚合场景特征,预测控制指令。该策略仅使用合成数据训练即可实现稳健的零样本仿真到现实迁移,在面对不同形状、尺寸和重量的新物体时保持稳定性能,在真实世界通用灵巧抓取任务中平均成功率达81.2%。为促进双手机器人抓取研究的发展,我们在https://github.com/InternRobotics/UltraDexGrasp开源了数据生成管道。
English
Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.