SimFoundry: ポリシー学習と評価のためのモジュール式自動シーン生成

要旨

実世界におけるロボットポリシーの訓練と評価はコストが高く、スケールさせるのが困難です。本稿では、SimFoundryを紹介します。これは、映像からゼロショットで実世界からシミュレーションへのシーン構築を行う、モジュール型かつ自動化されたシステムです。SimFoundryは、シミュレーション対応のデジタルツインを生成し、オブジェクト、シーン、タスクの編集をサポートすることで、多様なデジタルカズン（再構築された実世界シーンのアフォーダンスを保持したバリエーション）の自動生成を可能にします。SimFoundryのデータで訓練されたポリシーは、多段階操作、関節物体とのインタラクション、両腕によるインタラクションを伴う困難な実タスクにゼロショットで転移し、そのデジタルカズン（元のシーン、オブジェクト、タスクのバリエーション）は、新たな実世界条件への汎化を促進します。7つの操作タスクと5つのポリシーアーキテクチャにわたって、SimFoundryのシミュレーション評価は実世界の性能を強く予測し、平均ピアソン相関係数0.911、平均最大ランキング違反0.018を示しました。シミュレーションで訓練されたポリシーを実世界でゼロショット評価した場合、シミュレーション内でオブジェクト、シーン、タスクのカズンを用いて訓練されたポリシーは、タスク成功率の平均向上率がそれぞれ17％、21％、40％を示しました。詳細はhttps://research.nvidia.com/labs/gear/simfoundry/をご参照ください。

English

Training and evaluating robot policies in the real world is costly and difficult to scale. We introduce SimFoundry, a modular and automated system for zero-shot real-to-sim scene construction from a video. SimFoundry generates sim-ready digital twins and supports object, scene, and task editing, enabling the automated generation of diverse digital cousins: affordance-preserving variations of reconstructed real-world scenes. Policies trained on SimFoundry data transfer zero-shot to challenging real tasks involving multi-step manipulation, articulated object interaction, and bimanual interaction, and its digital cousins (variations of the original scene, objects, and tasks) facilitate generalization to new real-world conditions. Across 7 manipulation tasks and 5 policy architectures, SimFoundry simulation evaluations strongly predict real-world performance, with mean Pearson correlation 0.911 and mean maximum ranking violation 0.018. When evaluating sim-trained policies zero-shot in the real world, policies trained with object, scene, and task cousins in simulation show average task success rate improvements of 17%, 21%, and 40%, respectively. Additional details at https://research.nvidia.com/labs/gear/simfoundry/ .