UltraDexGrasp:基于合成数据实现双手机器人通用灵巧抓取学习
UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data
March 5, 2026
作者: Sizhe Yang, Yiman Xie, Zhixuan Liang, Yang Tian, Jia Zeng, Dahua Lin, Jiangmiao Pang
cs.AI
摘要
抓取是机器人实现物理交互的基础能力。人类凭借双手能够根据物体形状、尺寸及重量自主选择适宜的抓取策略,实现稳健抓取及后续操作。相较之下,当前机器人抓取技术仍存在局限,尤其在多策略场景中。尽管针对平行夹爪和单手机器人的研究已取得显著进展,但双手灵巧抓取领域仍待探索,其中数据匮乏是主要瓶颈。要实现能抵抗外部力矩的物理合理且几何贴合式抓取面临重大挑战。为此,我们提出UltraDexGrasp——一种面向双手机器人的通用灵巧抓取框架。该数据生成管道将基于优化的抓取合成与基于规划的示教生成相结合,产出跨多种抓取策略的高质量多样化轨迹。基于此框架,我们构建了UltraDexGrasp-20M大规模多策略抓取数据集,涵盖1,000个物体的2,000万帧数据。以此为基础,我们进一步开发了以点云为输入、通过单向注意力聚合场景特征并预测控制指令的简洁高效抓取策略。该策略仅通过合成数据训练即可实现稳健的零样本仿真到现实迁移,在面对不同形状、尺寸和重量的新物体时保持稳定成功率,在实际通用灵巧抓取任务中平均成功率高达81.2%。为促进双手机器人抓取研究,我们在https://github.com/InternRobotics/UltraDexGrasp开源了数据生成管道。
English
Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.