RoboTwin 2.0：面向鲁棒双手机器人操作的可扩展数据生成器与强领域随机化基准

摘要

基于仿真的数据合成已成为增强现实世界机器人操控能力的重要范式。然而，现有的合成数据集在应对复杂双手操作任务时仍显不足，主要面临两大挑战：(1) 缺乏针对新任务的高效、可扩展数据生成方法；(2) 仿真环境过于简化，难以捕捉现实世界的复杂性。我们推出RoboTwin 2.0，一个可扩展的仿真框架，支持自动化、大规模生成多样且真实的数据，并提供统一的双臂操作评估协议。首先，我们构建了RoboTwin-OD，一个包含147个类别、731个实例的大规模物体库，每个实例均标注了语义及与操作相关的标签。在此基础上，我们开发了一套专家数据合成流程，结合多模态大语言模型（MLLMs）与仿真循环优化，自动生成任务级执行代码。为提升仿真到现实的迁移能力，RoboTwin 2.0在五个维度上引入了结构化领域随机化：杂物、光照、背景、桌面高度及语言指令，从而增强数据多样性和策略鲁棒性。我们将该框架应用于涵盖五种机器人形态的50项双臂任务中，预先收集了超过100,000条领域随机化的专家轨迹。实验结果显示，代码生成成功率提升了10.9%，并在面对新现实场景时展现出更好的泛化能力。基于我们数据集微调的VLA模型在未见过的现实场景任务上实现了367%的相对提升（42.0% vs. 9.0%），而仅使用我们合成数据训练的零样本模型也获得了228%的相对增益，凸显了无需现实监督下的强大泛化能力。我们公开了数据生成器、基准测试、数据集及代码，以支持可扩展的鲁棒双手操作研究。

English

Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.

RoboTwin 2.0：面向鲁棒双手机器人操作的可扩展数据生成器与强领域随机化基准

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

摘要

Support