從物理世界模型學習的機器人技術
Robot Learning from a Physical World Model
November 10, 2025
作者: Jiageng Mao, Sicheng He, Hao-Ning Wu, Yang You, Shuyang Sun, Zhicheng Wang, Yanan Bao, Huizhong Chen, Leonidas Guibas, Vitor Guizilini, Howard Zhou, Yue Wang
cs.AI
摘要
我們推出PhysWorld框架,該框架通過物理世界建模實現從影片生成中進行機器人學習。近期影片生成模型能根據語言指令與圖像合成逼真的視覺示範,為機器人學提供了強大卻尚未被充分探索的訓練信號來源。然而,直接將生成影片的像素運動重定向至機器人會忽略物理規律,常導致操作失準。PhysWorld通過耦合影片生成與物理世界重建來解決此局限:在給定單張圖像與任務指令後,我們的方法能生成任務條件化影片並從中重建底層物理世界,再通過基於物件中心殘差強化學習與物理世界模型,將生成影片的運動轉化為物理精準的動作。這種協同作用將隱性視覺指導轉化為可物理執行的機器人軌跡,無需真實機器人數據收集即可實現零樣本泛化的機器人操作。在多元現實任務上的實驗表明,PhysWorld相較既有方法顯著提升操作精度。詳情請訪問項目頁面:https://pointscoder.github.io/PhysWorld_Web/。
English
We introduce PhysWorld, a framework that enables robot learning from video
generation through physical world modeling. Recent video generation models can
synthesize photorealistic visual demonstrations from language commands and
images, offering a powerful yet underexplored source of training signals for
robotics. However, directly retargeting pixel motions from generated videos to
robots neglects physics, often resulting in inaccurate manipulations. PhysWorld
addresses this limitation by coupling video generation with physical world
reconstruction. Given a single image and a task command, our method generates
task-conditioned videos and reconstructs the underlying physical world from the
videos, and the generated video motions are grounded into physically accurate
actions through object-centric residual reinforcement learning with the
physical world model. This synergy transforms implicit visual guidance into
physically executable robotic trajectories, eliminating the need for real robot
data collection and enabling zero-shot generalizable robotic manipulation.
Experiments on diverse real-world tasks demonstrate that PhysWorld
substantially improves manipulation accuracy compared to previous approaches.
Visit https://pointscoder.github.io/PhysWorld_Web/{the project webpage}
for details.