從物理世界模型學習的機器人技術

摘要

我們推出PhysWorld框架，該框架通過物理世界建模實現從影片生成中進行機器人學習。近期影片生成模型能根據語言指令與圖像合成逼真的視覺示範，為機器人學提供了強大卻尚未被充分探索的訓練信號來源。然而，直接將生成影片的像素運動重定向至機器人會忽略物理規律，常導致操作失準。PhysWorld通過耦合影片生成與物理世界重建來解決此局限：在給定單張圖像與任務指令後，我們的方法能生成任務條件化影片並從中重建底層物理世界，再通過基於物件中心殘差強化學習與物理世界模型，將生成影片的運動轉化為物理精準的動作。這種協同作用將隱性視覺指導轉化為可物理執行的機器人軌跡，無需真實機器人數據收集即可實現零樣本泛化的機器人操作。在多元現實任務上的實驗表明，PhysWorld相較既有方法顯著提升操作精度。詳情請訪問項目頁面：https://pointscoder.github.io/PhysWorld_Web/。

English

We introduce PhysWorld, a framework that enables robot learning from video generation through physical world modeling. Recent video generation models can synthesize photorealistic visual demonstrations from language commands and images, offering a powerful yet underexplored source of training signals for robotics. However, directly retargeting pixel motions from generated videos to robots neglects physics, often resulting in inaccurate manipulations. PhysWorld addresses this limitation by coupling video generation with physical world reconstruction. Given a single image and a task command, our method generates task-conditioned videos and reconstructs the underlying physical world from the videos, and the generated video motions are grounded into physically accurate actions through object-centric residual reinforcement learning with the physical world model. This synergy transforms implicit visual guidance into physically executable robotic trajectories, eliminating the need for real robot data collection and enabling zero-shot generalizable robotic manipulation. Experiments on diverse real-world tasks demonstrate that PhysWorld substantially improves manipulation accuracy compared to previous approaches. Visit https://pointscoder.github.io/PhysWorld_Web/{the project webpage} for details.

從物理世界模型學習的機器人技術

Robot Learning from a Physical World Model

摘要

Support