ChatPaper.aiChatPaper

PhysX-Anything:基于单张图像的仿真就绪物理三维资产生成

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

November 17, 2025
作者: Ziang Cao, Fangzhou Hong, Zhaoxi Chen, Liang Pan, Ziwei Liu
cs.AI

摘要

三维建模正从静态视觉呈现转向可直接用于仿真与交互的物理化可动资产。然而,现有三维生成方法大多忽视了关键的物理属性与可动结构,限制了其在具身智能领域的应用价值。为弥补这一空白,我们推出PhysX-Anything——首个面向仿真应用的物理三维生成框架,能够基于单张真实场景图像生成具备显式几何、可动结构与物理属性的高质量仿真就绪三维资产。具体而言,我们提出首个基于视觉语言模型(VLM)的物理三维生成模型,并创新性地设计了一种高效表征几何信息的三维表示方法。该方法将表征所需标记数量降低193倍,使得在标准VLM标记预算内实现显式几何学习成为可能,且无需在微调阶段引入特殊标记,显著提升了生成质量。此外,为克服现有物理三维数据集多样性不足的问题,我们构建了PhysX-Mobility数据集,将原有物理三维数据集的物体类别扩展2倍以上,包含2000余个常见真实物体并附带丰富物理标注。在PhysX-Mobility数据集和真实场景图像上的大量实验表明,PhysX-Anything具有出色的生成性能与稳健的泛化能力。基于MuJoCo风格环境的仿真实验进一步验证了我们的仿真就绪资产可直接用于接触密集型的机器人策略学习。我们相信PhysX-Anything将有力推动下游应用的发展,特别是在具身智能与物理仿真领域。
English
3D modeling is shifting from static visual representations toward physical, articulated assets that can be directly used in simulation and interaction. However, most existing 3D generation methods overlook key physical and articulation properties, thereby limiting their utility in embodied AI. To bridge this gap, we introduce PhysX-Anything, the first simulation-ready physical 3D generative framework that, given a single in-the-wild image, produces high-quality sim-ready 3D assets with explicit geometry, articulation, and physical attributes. Specifically, we propose the first VLM-based physical 3D generative model, along with a new 3D representation that efficiently tokenizes geometry. It reduces the number of tokens by 193x, enabling explicit geometry learning within standard VLM token budgets without introducing any special tokens during fine-tuning and significantly improving generative quality. In addition, to overcome the limited diversity of existing physical 3D datasets, we construct a new dataset, PhysX-Mobility, which expands the object categories in prior physical 3D datasets by over 2x and includes more than 2K common real-world objects with rich physical annotations. Extensive experiments on PhysX-Mobility and in-the-wild images demonstrate that PhysX-Anything delivers strong generative performance and robust generalization. Furthermore, simulation-based experiments in a MuJoCo-style environment validate that our sim-ready assets can be directly used for contact-rich robotic policy learning. We believe PhysX-Anything can substantially empower a broad range of downstream applications, especially in embodied AI and physics-based simulation.
PDF512December 1, 2025