ChatPaper.aiChatPaper

RoboScape:物理信息驱动的具身世界模型

RoboScape: Physics-informed Embodied World Model

June 29, 2025
作者: Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, Yong Li
cs.AI

摘要

世界模型已成为具身智能不可或缺的工具,作为强大的模拟器,能够生成逼真的机器人视频,同时应对关键的数据稀缺挑战。然而,当前的具身世界模型在物理感知方面表现有限,特别是在建模三维几何和运动动力学时,导致在接触密集的机器人场景中生成不真实的视频。本文提出RoboScape,一个统一的物理信息世界模型,在集成框架内联合学习RGB视频生成与物理知识。我们引入了两项关键的物理信息联合训练任务:时间深度预测,增强视频渲染中的三维几何一致性;关键点动力学学习,在提升复杂运动建模的同时,隐式编码物理属性(如物体形状和材料特性)。大量实验表明,RoboScape在多样化的机器人场景中生成的视频具有卓越的视觉保真度和物理合理性。我们进一步通过下游应用验证了其实用性,包括利用生成数据进行机器人策略训练和策略评估。我们的工作为构建高效的物理信息世界模型以推进具身智能研究提供了新的见解。代码已发布于:https://github.com/tsinghua-fib-lab/RoboScape。
English
World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. The code is available at: https://github.com/tsinghua-fib-lab/RoboScape.
PDF11July 1, 2025