SAM 3D:圖像萬物三維化
SAM 3D: 3Dfy Anything in Images
November 20, 2025
作者: SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Dollár, Georgia Gkioxari, Matt Feiszli, Jitendra Malik
cs.AI
摘要
我們推出SAM 3D——一個基於視覺的生成式3D物體重建模型,能從單張影像預測幾何結構、紋理材質與空間佈局。該模型在自然場景影像中表現卓越,尤其擅長處理常見遮擋與場景雜亂的狀況,並能有效利用上下文線索進行視覺識別。我們通過人機協同標註流程來實現這一目標,該流程可標註物體形狀、紋理和姿態,從而大規模生成視覺基礎紮實的3D重建數據。我們採用現代化多階段訓練框架,結合合成預訓練與真實世界對齊技術,突破3D「數據壁壘」進行模型學習。相較近期研究成果,本方法取得顯著提升,在真實物體與場景的人類偏好測試中獲得至少5:1的勝率。我們將公開原始碼與模型權重、線上演示平台,以及用於野外環境3D物體重建的新挑戰性基準數據集。
English
We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.