PhysForge:面向交互式虚拟世界的物理基础三维资产生成系统
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
May 6, 2026
作者: Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen, Zehuan Huang, Yao Mu, Zhuo Chen, Chunchao Guo, Xihui Liu
cs.AI
摘要
物理基础三维资产的合成是构建交互式虚拟世界与具身智能的关键瓶颈。现有方法主要聚焦静态几何形态,忽视了交互所必需的功能属性。我们认为交互式资产生成必须植根于功能逻辑与层级化物理原理。为弥补这一空白,我们提出PhysForge——由包含15万资产、具备四层级物理标注的大规模数据集PhysDB支撑的解耦双阶段框架。首先,视觉语言模型作为"物理架构师"规划出定义材质、功能与运动学约束的"层级化物理蓝图";随后,基于物理的扩散模型通过新型运动体素注入(KVI)机制,在生成高保真几何结构的同时精确合成运动学参数。实验表明,PhysForge能产出功能合理、支持仿真的三维资产,为交互式三维内容与具身智能体提供强大的数据引擎。
English
Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a "Hierarchical Physical Blueprint" defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.