GenSim2: マルチモーダルおよび推論を用いたロボットデータ生成のスケーリング

要旨

現在、ロボットシミュレーションは、多様なシミュレーションタスクやシーンを作成するために必要な人間の作業量が多いため、スケーリングが困難な状況が続いています。また、シミュレーショントレーニングされたポリシーも拡張性の問題に直面しており、多くのシミュレーションから実世界への手法が単一のタスクに焦点を当てています。これらの課題に対処するため、本研究では、複雑で現実的なシミュレーションタスクの作成、特に関節付きオブジェクトを含む長期のタスクに対応するために、コーディングLLMを活用した多モーダルおよび推論能力を備えたスケーラブルなフレームワークであるGenSim2を提案しています。これらのタスクのためにスケールでデモンストレーションデータを自動生成するために、オブジェクトカテゴリ内で一般化する計画とRLソルバを提案しています。このパイプラインは、最大100の関節タスクと200のオブジェクトのデータを生成し、必要な人間の作業量を削減します。このようなデータを活用するために、提案されたパイプラインとポリシーアーキテクチャを組み合わせ、生成されたデモンストレーションから学習し、強力なシミュレーションから実世界へのゼロショット転送を示す効果的なマルチタスク言語条件付きポリシーアーキテクチャ、プロプリオセプティブポイントクラウドトランスフォーマー（PPT）を提案しています。提案されたパイプラインとポリシーアーキテクチャを組み合わせることで、GenSim2の有望な利用法を示し、生成されたデータがゼロショット転送や実世界で収集されたデータとの共同トレーニングに使用でき、ポリシーのパフォーマンスを限られた実データのみで訓練する場合と比較して20％向上させることができることを示しています。

English

Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding LLMs with multi-modal and reasoning capabilities for complex and realistic simulation task creation, including long-horizon tasks with articulated objects. To automatically generate demonstration data for these tasks at scale, we propose planning and RL solvers that generalize within object categories. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. To utilize such data, we propose an effective multi-task language-conditioned policy architecture, dubbed proprioceptive point-cloud transformer (PPT), that learns from the generated demonstrations and exhibits strong sim-to-real zero-shot transfer. Combining the proposed pipeline and the policy architecture, we show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data, which enhances the policy performance by 20% compared with training exclusively on limited real data.

GenSim2: マルチモーダルおよび推論を用いたロボットデータ生成のスケーリング

GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

要旨

Support