ChatPaper.aiChatPaper

SAM 3D:图像万物三维化

SAM 3D: 3Dfy Anything in Images

November 20, 2025
作者: SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Dollár, Georgia Gkioxari, Matt Feiszli, Jitendra Malik
cs.AI

摘要

我们推出SAM 3D——一种基于视觉的生成式三维物体重建模型,能够通过单张图像预测几何结构、纹理贴图和空间布局。该模型在自然场景图像中表现卓越,尤其适用于存在遮挡和场景杂波的常见情况,其中基于上下文的视觉识别线索起着更为关键的作用。我们通过人机协同标注流程实现了物体形状、纹理和姿态的精准标注,以前所未有的规模提供了视觉基础的三维重建数据。在现代多阶段训练框架下,我们结合合成预训练与真实场景对齐技术,突破了三维数据的"资源壁垒"。相较于近期研究成果,我们实现了显著提升,在真实物体与场景的人类偏好测试中以至少5:1的胜率领先。我们将公开代码与模型权重、在线演示系统,以及针对野外环境三维物体重建的新挑战性基准测试集。
English
We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.
PDF1013December 1, 2025