Pixie：從像素快速且可泛化的三維物理監督學習

摘要

從視覺資訊推斷三維場景的物理屬性，是創建互動且逼真的虛擬世界的關鍵但具挑戰性的任務。雖然人類能直觀地理解如彈性或剛度等材料特性，現有方法通常依賴於緩慢的逐場景優化，限制了其通用性和應用範圍。為解決這一問題，我們提出了PIXIE，一種新穎的方法，它訓練一個可泛化的神經網絡，僅使用監督損失從三維視覺特徵預測多個場景的物理屬性。一旦訓練完成，我們的前饋網絡能夠快速推斷出合理的材料場，這與如高斯濺射等學習到的靜態場景表示相結合，能夠在外部力作用下實現逼真的物理模擬。為促進這項研究，我們還收集了PIXIEVERSE，這是已知最大的配對三維資產與物理材料註釋數據集之一。廣泛的評估表明，PIXIE比測試時優化方法優越約1.46至4.39倍，且速度快了數個數量級。通過利用如CLIP等預訓練的視覺特徵，我們的方法還能零樣本泛化到現實世界場景，儘管僅在合成數據上進行過訓練。https://pixie-3d.github.io/

English

Inferring the physical properties of 3D scenes from visual information is a critical yet challenging task for creating interactive and realistic virtual worlds. While humans intuitively grasp material characteristics such as elasticity or stiffness, existing methods often rely on slow, per-scene optimization, limiting their generalizability and application. To address this problem, we introduce PIXIE, a novel method that trains a generalizable neural network to predict physical properties across multiple scenes from 3D visual features purely using supervised losses. Once trained, our feed-forward network can perform fast inference of plausible material fields, which coupled with a learned static scene representation like Gaussian Splatting enables realistic physics simulation under external forces. To facilitate this research, we also collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and physic material annotations. Extensive evaluations demonstrate that PIXIE is about 1.46-4.39x better and orders of magnitude faster than test-time optimization methods. By leveraging pretrained visual features like CLIP, our method can also zero-shot generalize to real-world scenes despite only ever been trained on synthetic data. https://pixie-3d.github.io/

Pixie：從像素快速且可泛化的三維物理監督學習

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

摘要

Support