RoboScape: 物理情報を組み込んだ具象化世界モデル

要旨

世界モデルは、現実的なロボット動画を生成しつつ、重要なデータ不足の課題に対処できる強力なシミュレーターとして、具現化された知能にとって不可欠なツールとなっています。しかし、現在の具現化世界モデルは、特に3Dジオメトリと運動ダイナミクスのモデリングにおいて物理的認識が限られており、接触の多いロボットシナリオでは非現実的な動画生成が行われています。本論文では、RGB動画生成と物理知識を統合フレームワーク内で共同学習する統一された物理情報世界モデル「RoboScape」を提案します。我々は、動画レンダリングにおける3D幾何学的整合性を高める時間的深度予測と、物体形状や材料特性などの物理的特性を暗黙的にエンコードしつつ複雑な運動モデリングを改善するキーポイントダイナミクス学習という、2つの主要な物理情報共同学習タスクを導入します。広範な実験により、RoboScapeが多様なロボットシナリオにおいて優れた視覚的忠実度と物理的妥当性を備えた動画を生成することが実証されました。さらに、生成データを用いたロボットポリシー訓練やポリシー評価などの下流アプリケーションを通じて、その実用性を検証しています。本研究は、具現化知能研究を進めるための効率的な物理情報世界モデルの構築に新たな知見を提供します。コードは以下で公開されています: https://github.com/tsinghua-fib-lab/RoboScape.

English

World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. The code is available at: https://github.com/tsinghua-fib-lab/RoboScape.

RoboScape: 物理情報を組み込んだ具象化世界モデル

RoboScape: Physics-informed Embodied World Model

要旨

Support