ChatPaper.aiChatPaper

ABot-PhysWorld:面向机器人操作的物理对齐交互世界基础模型

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

March 24, 2026
作者: Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu
cs.AI

摘要

基于视频的世界模型为具身模拟与规划提供了强大范式,但当前最先进的模型常因训练数据泛化及忽略物理规律的似然目标,产生物体穿透、反重力运动等违反物理定律的操控效果。我们提出ABot-PhysWorld——一个140亿参数的扩散Transformer模型,能生成视觉逼真、物理合理且动作可控的视频。该模型基于包含三百万段物理标注操控视频的精选数据集,采用新型基于DPO的解耦判别器后训练框架,在保持画质的同时抑制非物理行为。通过并行上下文模块实现跨具身控制的精准空间动作注入。为更好评估泛化能力,我们推出首个无需训练数据的具身零样本基准EZSbench,融合真实与合成的未知机器人-任务-场景组合,采用解耦评估协议分别检验物理真实性与动作对齐度。ABot-PhysWorld在PBench和EZSbench上均实现最先进性能,在物理合理性与轨迹一致性方面超越Veo 3.1和Sora v2 Pro。我们将公开EZSbench以推动具身视频生成的标准化评估。
English
Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ignore physical laws. We present ABot-PhysWorld, a 14B Diffusion Transformer model that generates visually realistic, physically plausible, and action-controllable videos. Built on a curated dataset of three million manipulation clips with physics-aware annotation, it uses a novel DPO-based post-training framework with decoupled discriminators to suppress unphysical behaviors while preserving visual quality. A parallel context block enables precise spatial action injection for cross-embodiment control. To better evaluate generalization, we introduce EZSbench, the first training-independent embodied zero-shot benchmark combining real and synthetic unseen robot-task-scene combinations. It employs a decoupled protocol to separately assess physical realism and action alignment. ABot-PhysWorld achieves new state-of-the-art performance on PBench and EZSbench, surpassing Veo 3.1 and Sora v2 Pro in physical plausibility and trajectory consistency. We will release EZSbench to promote standardized evaluation in embodied video generation.
PDF20March 26, 2026