PhysX-Anything:基於單張圖像的即用型物理3D資產生成
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
November 17, 2025
作者: Ziang Cao, Fangzhou Hong, Zhaoxi Chen, Liang Pan, Ziwei Liu
cs.AI
摘要
三維建模正從靜態視覺呈現轉向物理化、可動態關節化的資產,這些資產能直接用於模擬與互動。然而,現有多數三維生成方法忽略了關鍵的物理與關節屬性,限制了其在具身人工智慧中的實用性。為彌合這一差距,我們提出PhysX-Anything——首個支援模擬的物理三維生成框架,僅需輸入單張真實場景圖像,即可生成具有顯式幾何結構、關節系統與物理屬性的高品質模擬就緒三維資產。具體而言,我們首創基於視覺語言模型(VLM)的物理三維生成模型,並提出一種能高效標記化幾何數據的新三維表示法。該方法將標記數量減少193倍,使顯式幾何學習能在標準VLM標記預算內實現,且無需在微調階段引入特殊標記,顯著提升生成品質。此外,為克服現有物理三維數據集多樣性不足的問題,我們構建了PhysX-Mobility數據集,將現有物理三維數據集的物體類別擴充逾2倍,涵蓋超過2,000種常見真實物體並附帶豐富物理標註。在PhysX-Mobility與真實場景圖像上的大量實驗表明,PhysX-Anything具備卓越的生成性能與強健的泛化能力。進一步在MuJoCo風格環境中的模擬實驗驗證,我們的模擬就緒資產可直接用於接觸密集型的機器人策略學習。我們相信PhysX-Anything將顯著賦能廣泛的下游應用,特別是在具身人工智慧與物理模擬領域。
English
3D modeling is shifting from static visual representations toward physical, articulated assets that can be directly used in simulation and interaction. However, most existing 3D generation methods overlook key physical and articulation properties, thereby limiting their utility in embodied AI. To bridge this gap, we introduce PhysX-Anything, the first simulation-ready physical 3D generative framework that, given a single in-the-wild image, produces high-quality sim-ready 3D assets with explicit geometry, articulation, and physical attributes. Specifically, we propose the first VLM-based physical 3D generative model, along with a new 3D representation that efficiently tokenizes geometry. It reduces the number of tokens by 193x, enabling explicit geometry learning within standard VLM token budgets without introducing any special tokens during fine-tuning and significantly improving generative quality. In addition, to overcome the limited diversity of existing physical 3D datasets, we construct a new dataset, PhysX-Mobility, which expands the object categories in prior physical 3D datasets by over 2x and includes more than 2K common real-world objects with rich physical annotations. Extensive experiments on PhysX-Mobility and in-the-wild images demonstrate that PhysX-Anything delivers strong generative performance and robust generalization. Furthermore, simulation-based experiments in a MuJoCo-style environment validate that our sim-ready assets can be directly used for contact-rich robotic policy learning. We believe PhysX-Anything can substantially empower a broad range of downstream applications, especially in embodied AI and physics-based simulation.