ChatPaper.aiChatPaper

GenSim2:利用多模态和推理扩展机器人数据生成LLM。

GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

October 4, 2024
作者: Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, Lirui Wang
cs.AI

摘要

由于需要人工劳动来创建多样化的仿真任务和场景,目前机器人仿真仍然具有挑战性,难以扩展。受训于仿真的策略也面临可扩展性问题,因为许多从仿真到实际的方法侧重于单一任务。为了解决这些挑战,本研究提出了GenSim2,这是一个可扩展的框架,利用具有多模态和推理能力的编码LLMs来创建复杂和逼真的仿真任务,包括具有关节对象的长视程任务。为了自动地大规模生成这些任务的示范数据,我们提出了能够在对象类别内泛化的规划和RL求解器。该流程可以为多达100个关节任务生成数据,涉及200个对象,并减少了所需的人工工作。为了利用这些数据,我们提出了一种有效的多任务语言条件策略架构,名为本体感知点云变换器(PPT),它从生成的示范中学习,并展示了强大的从仿真到实际的零样本转移。结合所提出的流程和策略架构,我们展示了GenSim2的一个有前途的用途,即生成的数据可以用于零样本转移或与真实收集的数据共同训练,从而将策略性能提高了20%,相较于仅在有限真实数据上训练。
English
Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding LLMs with multi-modal and reasoning capabilities for complex and realistic simulation task creation, including long-horizon tasks with articulated objects. To automatically generate demonstration data for these tasks at scale, we propose planning and RL solvers that generalize within object categories. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. To utilize such data, we propose an effective multi-task language-conditioned policy architecture, dubbed proprioceptive point-cloud transformer (PPT), that learns from the generated demonstrations and exhibits strong sim-to-real zero-shot transfer. Combining the proposed pipeline and the policy architecture, we show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data, which enhances the policy performance by 20% compared with training exclusively on limited real data.
PDF32November 16, 2024