ChatPaper.aiChatPaper

从物理世界模型中学习的机器人

Robot Learning from a Physical World Model

November 10, 2025
作者: Jiageng Mao, Sicheng He, Hao-Ning Wu, Yang You, Shuyang Sun, Zhicheng Wang, Yanan Bao, Huizhong Chen, Leonidas Guibas, Vitor Guizilini, Howard Zhou, Yue Wang
cs.AI

摘要

我们推出PhysWorld框架,该框架通过物理世界建模实现从视频生成中学习机器人技能。近年来,视频生成模型能够根据语言指令和图像合成具有照片级真实感的视觉演示,这为机器人技术提供了强大却尚未充分开发的训练信号来源。然而,直接将生成视频中的像素运动映射到机器人会忽略物理规律,往往导致操作失准。PhysWorld通过将视频生成与物理世界重建相耦合来解决这一局限。给定单张图像和任务指令,我们的方法既能生成任务条件化视频,又能从视频中重建底层物理世界,同时通过基于物体的残差强化学习与物理世界模型,将生成的视频动作转化为符合物理规律的精准操作。这种协同作用将隐式视觉指导转化为可物理执行的机器人轨迹,无需真实机器人数据采集即可实现零样本泛化的机器人操作。在多样化现实任务上的实验表明,PhysWorld相较现有方法显著提升了操作精度。详情请访问项目网页:https://pointscoder.github.io/PhysWorld_Web/。
English
We introduce PhysWorld, a framework that enables robot learning from video generation through physical world modeling. Recent video generation models can synthesize photorealistic visual demonstrations from language commands and images, offering a powerful yet underexplored source of training signals for robotics. However, directly retargeting pixel motions from generated videos to robots neglects physics, often resulting in inaccurate manipulations. PhysWorld addresses this limitation by coupling video generation with physical world reconstruction. Given a single image and a task command, our method generates task-conditioned videos and reconstructs the underlying physical world from the videos, and the generated video motions are grounded into physically accurate actions through object-centric residual reinforcement learning with the physical world model. This synergy transforms implicit visual guidance into physically executable robotic trajectories, eliminating the need for real robot data collection and enabling zero-shot generalizable robotic manipulation. Experiments on diverse real-world tasks demonstrate that PhysWorld substantially improves manipulation accuracy compared to previous approaches. Visit https://pointscoder.github.io/PhysWorld_Web/{the project webpage} for details.
PDF282December 2, 2025