Pixie:從像素快速且可泛化的三維物理監督學習
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
August 20, 2025
作者: Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, Lingjie Liu
cs.AI
摘要
從視覺資訊推斷三維場景的物理屬性,是創建互動且逼真的虛擬世界的關鍵但具挑戰性的任務。雖然人類能直觀地理解如彈性或剛度等材料特性,現有方法通常依賴於緩慢的逐場景優化,限制了其通用性和應用範圍。為解決這一問題,我們提出了PIXIE,一種新穎的方法,它訓練一個可泛化的神經網絡,僅使用監督損失從三維視覺特徵預測多個場景的物理屬性。一旦訓練完成,我們的前饋網絡能夠快速推斷出合理的材料場,這與如高斯濺射等學習到的靜態場景表示相結合,能夠在外部力作用下實現逼真的物理模擬。為促進這項研究,我們還收集了PIXIEVERSE,這是已知最大的配對三維資產與物理材料註釋數據集之一。廣泛的評估表明,PIXIE比測試時優化方法優越約1.46至4.39倍,且速度快了數個數量級。通過利用如CLIP等預訓練的視覺特徵,我們的方法還能零樣本泛化到現實世界場景,儘管僅在合成數據上進行過訓練。https://pixie-3d.github.io/
English
Inferring the physical properties of 3D scenes from visual information is a
critical yet challenging task for creating interactive and realistic virtual
worlds. While humans intuitively grasp material characteristics such as
elasticity or stiffness, existing methods often rely on slow, per-scene
optimization, limiting their generalizability and application. To address this
problem, we introduce PIXIE, a novel method that trains a generalizable neural
network to predict physical properties across multiple scenes from 3D visual
features purely using supervised losses. Once trained, our feed-forward network
can perform fast inference of plausible material fields, which coupled with a
learned static scene representation like Gaussian Splatting enables realistic
physics simulation under external forces. To facilitate this research, we also
collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and
physic material annotations. Extensive evaluations demonstrate that PIXIE is
about 1.46-4.39x better and orders of magnitude faster than test-time
optimization methods. By leveraging pretrained visual features like CLIP, our
method can also zero-shot generalize to real-world scenes despite only ever
been trained on synthetic data. https://pixie-3d.github.io/