UltraDexGrasp: 합성 데이터를 이용한 양손 로봇의 범용 정밀 파지 학습

초록

로봇이 물리적 세계와 상호작용하기 위한 기본적인 능력은 파지( grasping)입니다. 인간은 두 개의 손을 갖추고 있으며, 물체의 형태, 크기, 무게에 따라 적절한 파지 전략을 자율적으로 선택하여 강력한 파지와 후속 조작을 가능하게 합니다. 이와 대조적으로, 현재의 로봇 파지 기술은 특히 다중 전략 환경에서 여전히 제한적입니다. 평행 그리퍼 및 단일 손 파지를 대상으로 한 상당한 노력이 있었지만, 양손 로봇을 위한 정교한 파지(dexterous grasping) 기술은 데이터가 주요 병목 현상으로 작용하며 여전히 충분히 연구되지 않았습니다. 외부 렌치(wrench)를 견딜 수 있는 물리적으로 타당하고 기하학적으로 부합하는 파지를 달성하는 것은 상당한 과제입니다. 이러한 문제를 해결하기 위해 우리는 양손 로봇을 위한 범용 정교 파지 프레임워크인 UltraDexGrasp를 소개합니다. 제안된 데이터 생성 파이프라인은 최적화 기반 파지 합성과 계획 기반 데모 생성을 통합하여 여러 파지 전략에 걸쳐 고품질이고 다양한 궤적을 생성합니다. 이 프레임워크를 통해 우리는 1,000개 객체에 걸쳐 2천만 프레임으로 구성된 대규모 다중 전략 파지 데이터셋인 UltraDexGrasp-20M을 구축했습니다. UltraDexGrasp-20M을 기반으로 우리는 포인트 클라우드를 입력으로 받고, 단방향 주의(unidirectional attention)를 통해 장면 특징을 집계하며, 제어 명령을 예측하는 간단하면서도 효과적인 파지 정책을 추가로 개발했습니다. 합성 데이터만으로 훈련된 이 정책은 강력한 제로샷( zero-shot) 시뮬레이션-투-리얼(sim-to-real) 전이를 달성하고 다양한 형태, 크기, 무게를 가진 새로운 객체에서도 일관되게 성공하여 실제 세계의 범용 정교 파지에서 평균 81.2%의 성공률을 기록했습니다. 양손 로봇 파지에 대한 향후 연구를 촉진하기 위해 우리는 데이터 생성 파이프라인을 https://github.com/InternRobotics/UltraDexGrasp 에서 오픈소스로 공개합니다.

English

Grasping is a fundamental capability for robots to interact with the physical world. Humans, equipped with two hands, autonomously select appropriate grasp strategies based on the shape, size, and weight of objects, enabling robust grasping and subsequent manipulation. In contrast, current robotic grasping remains limited, particularly in multi-strategy settings. Although substantial efforts have targeted parallel-gripper and single-hand grasping, dexterous grasping for bimanual robots remains underexplored, with data being a primary bottleneck. Achieving physically plausible and geometrically conforming grasps that can withstand external wrenches poses significant challenges. To address these issues, we introduce UltraDexGrasp, a framework for universal dexterous grasping with bimanual robots. The proposed data-generation pipeline integrates optimization-based grasp synthesis with planning-based demonstration generation, yielding high-quality and diverse trajectories across multiple grasp strategies. With this framework, we curate UltraDexGrasp-20M, a large-scale, multi-strategy grasp dataset comprising 20 million frames across 1,000 objects. Based on UltraDexGrasp-20M, we further develop a simple yet effective grasp policy that takes point clouds as input, aggregates scene features via unidirectional attention, and predicts control commands. Trained exclusively on synthetic data, the policy achieves robust zero-shot sim-to-real transfer and consistently succeeds on novel objects with varied shapes, sizes, and weights, attaining an average success rate of 81.2% in real-world universal dexterous grasping. To facilitate future research on grasping with bimanual robots, we open-source the data generation pipeline at https://github.com/InternRobotics/UltraDexGrasp.

UltraDexGrasp: 합성 데이터를 이용한 양손 로봇의 범용 정밀 파지 학습

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

초록

Support