ChatPaper.aiChatPaper

RoboTwin 2.0:一個具備強領域隨機化的可擴展數據生成器與基準,用於穩健的雙手機器人操作

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

June 22, 2025
作者: Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, Yao Mu
cs.AI

摘要

基於模擬的數據合成已成為增強現實世界機器人操作的有力範式。然而,現有的合成數據集在應對雙臂操作的魯棒性方面仍顯不足,主要面臨兩大挑戰:(1) 缺乏針對新任務的高效、可擴展數據生成方法;(2) 模擬環境過於簡化,未能捕捉現實世界的複雜性。我們提出了RoboTwin 2.0,這是一個可擴展的模擬框架,能夠自動化、大規模生成多樣且真實的數據,並提供雙臂操作的統一評估協議。首先,我們構建了RoboTwin-OD,這是一個包含147個類別共731個實例的大規模物體庫,每個實例均標註了語義及與操作相關的標籤。基於此,我們開發了一條專家數據合成流水線,結合多模態大語言模型(MLLMs)與模擬內循環優化,自動生成任務級別的執行代碼。為提升模擬到現實的遷移能力,RoboTwin 2.0引入了五個維度的結構化領域隨機化:雜物、光照、背景、桌面高度及語言指令,從而增強數據多樣性與策略魯棒性。我們在涵蓋五種機器人實體的50項雙臂任務中實例化了該框架,並預先收集了超過100,000條領域隨機化的專家軌跡。實驗結果顯示,代碼生成成功率提升了10.9%,並在面對新現實場景時展現出更好的泛化能力。基於我們數據集微調的VLA模型在未見場景的現實任務中實現了367%的相對提升(42.0%對比9.0%),而僅在合成數據上訓練的零樣本模型則獲得了228%的相對增益,凸顯了無需現實監督的強大泛化能力。我們公開了數據生成器、基準測試、數據集及代碼,以支持魯棒雙臂操作的可擴展研究。
English
Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.
PDF161June 26, 2025