ChatPaper.aiChatPaper

ABot-PhysWorld:面向物理对齐机器人操作任务的交互式世界基础模型

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

March 24, 2026
作者: Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu
cs.AI

摘要

基于视频的世界模型为具身模拟与规划提供了强大范式,但当前最先进的模型常因训练数据泛化及忽略物理规律的似然目标,产生物体穿透、反重力运动等违反物理法则的操控效果。我们提出ABot-PhysWorld——一个140亿参数的扩散Transformer模型,能生成视觉逼真、物理合理且动作可控的视频。该模型基于包含三百万段物理标注操控视频的精选数据集,采用新型DPO后训练框架与解耦判别器,在保持画质的同时抑制非物理行为。通过并行上下文模块实现跨具身系统的精确空间动作注入。为更好评估泛化能力,我们推出首个训练无关的具身零样本基准EZSbench,融合真实与合成环境中未见过的机器人-任务-场景组合,采用解耦评估协议分别检验物理真实性与动作对齐度。ABot-PhysWorld在PBench和EZSbench上实现最新最优性能,在物理合理性与轨迹一致性方面超越Veo 3.1与Sora v2 Pro。我们将开源EZSbench以推动具身视频生成的标准化评估。
English
Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ignore physical laws. We present ABot-PhysWorld, a 14B Diffusion Transformer model that generates visually realistic, physically plausible, and action-controllable videos. Built on a curated dataset of three million manipulation clips with physics-aware annotation, it uses a novel DPO-based post-training framework with decoupled discriminators to suppress unphysical behaviors while preserving visual quality. A parallel context block enables precise spatial action injection for cross-embodiment control. To better evaluate generalization, we introduce EZSbench, the first training-independent embodied zero-shot benchmark combining real and synthetic unseen robot-task-scene combinations. It employs a decoupled protocol to separately assess physical realism and action alignment. ABot-PhysWorld achieves new state-of-the-art performance on PBench and EZSbench, surpassing Veo 3.1 and Sora v2 Pro in physical plausibility and trajectory consistency. We will release EZSbench to promote standardized evaluation in embodied video generation.
PDF20March 26, 2026