Pixie:基于像素的3D物理快速可泛化监督学习
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
August 20, 2025
作者: Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, Lingjie Liu
cs.AI
摘要
从视觉信息中推断三维场景的物理属性,对于创建交互式且逼真的虚拟世界而言,是一项至关重要却又极具挑战性的任务。尽管人类能够直观地理解诸如弹性或硬度等材料特性,但现有方法往往依赖于缓慢的逐场景优化,这限制了其通用性和应用范围。为解决这一问题,我们提出了PIXIE,一种创新方法,它通过训练一个可泛化的神经网络,仅利用监督损失从三维视觉特征中预测跨多个场景的物理属性。一旦训练完成,我们的前馈网络便能快速推断出合理的材料场,结合如高斯溅射等学习到的静态场景表示,能够在外部力作用下实现逼真的物理模拟。为支持这一研究,我们还收集了PIXIEVERSE,这是已知最大的配对三维资产与物理材料标注数据集之一。广泛的评估表明,PIXIE在性能上比测试时优化方法高出约1.46至4.39倍,且速度提升了数个数量级。通过利用如CLIP等预训练的视觉特征,我们的方法即便仅在合成数据上训练,也能零样本泛化至真实世界场景。https://pixie-3d.github.io/
English
Inferring the physical properties of 3D scenes from visual information is a
critical yet challenging task for creating interactive and realistic virtual
worlds. While humans intuitively grasp material characteristics such as
elasticity or stiffness, existing methods often rely on slow, per-scene
optimization, limiting their generalizability and application. To address this
problem, we introduce PIXIE, a novel method that trains a generalizable neural
network to predict physical properties across multiple scenes from 3D visual
features purely using supervised losses. Once trained, our feed-forward network
can perform fast inference of plausible material fields, which coupled with a
learned static scene representation like Gaussian Splatting enables realistic
physics simulation under external forces. To facilitate this research, we also
collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and
physic material annotations. Extensive evaluations demonstrate that PIXIE is
about 1.46-4.39x better and orders of magnitude faster than test-time
optimization methods. By leveraging pretrained visual features like CLIP, our
method can also zero-shot generalize to real-world scenes despite only ever
been trained on synthetic data. https://pixie-3d.github.io/