RoboTwin 2.0: 강력한 도메인 랜덤화를 통한 견고한 양손 로봇 조작을 위한 확장 가능한 데이터 생성기 및 벤치마크

초록

시뮬레이션 기반 데이터 합성은 현실 세계의 로봇 조작 성능을 향상시키는 강력한 패러다임으로 부상하고 있다. 그러나 기존의 합성 데이터셋은 두 가지 과제로 인해 견고한 양손 조작을 지원하기에는 여전히 부족한 실정이다: (1) 새로운 작업을 위한 효율적이고 확장 가능한 데이터 생성 방법의 부재, 그리고 (2) 현실 세계의 복잡성을 충분히 반영하지 못하는 지나치게 단순화된 시뮬레이션 환경. 본 연구에서는 자동화된 대규모 데이터 생성과 양손 조작을 위한 통합 평가 프로토콜을 제공하는 확장 가능한 시뮬레이션 프레임워크인 RoboTwin 2.0을 제안한다. 먼저, 147개 카테고리에 걸쳐 731개의 인스턴스로 구성된 대규모 객체 라이브러리인 RoboTwin-OD를 구축하고, 각 객체에 의미론적 및 조작 관련 레이블을 부여하였다. 이를 기반으로, 다중 모드 대형 언어 모델(MLLMs)과 시뮬레이션 내 반복적 개선을 결합하여 작업 수준의 실행 코드를 자동으로 생성하는 전문가 데이터 합성 파이프라인을 개발하였다. 시뮬레이션에서 현실로의 전이 성능을 개선하기 위해, RoboTwin 2.0은 다섯 가지 축(잡동사니, 조명, 배경, 테이블 높이, 언어 지시)에 걸친 구조화된 도메인 랜덤화를 도입하여 데이터 다양성과 정책 견고성을 강화하였다. 이 프레임워크를 5가지 로봇 구현체에 걸쳐 50개의 양손 작업에 적용하고, 100,000개 이상의 도메인 랜덤화된 전문가 궤적을 사전 수집하였다. 실험 결과, 코드 생성 성공률이 10.9% 향상되었으며, 새로운 현실 세계 시나리오에 대한 일반화 능력이 개선되었다. 본 데이터셋으로 미세 조정된 VLA 모델은 미지의 현실 세계 작업에서 367%의 상대적 개선(42.0% 대 9.0%)을 달성했으며, 합성 데이터만으로 학습된 제로샷 모델은 228%의 상대적 이득을 보여 현실 세계 감독 없이도 강력한 일반화 능력을 입증하였다. 견고한 양손 조작 연구를 지원하기 위해 데이터 생성기, 벤치마크, 데이터셋 및 코드를 공개한다.

English

Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.

RoboTwin 2.0: 강력한 도메인 랜덤화를 통한 견고한 양손 로봇 조작을 위한 확장 가능한 데이터 생성기 및 벤치마크

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

초록

Support