Fast-SAM3D:图像万物3D化,速度再升级
Fast-SAM3D: 3Dfy Anything in Images but Faster
February 5, 2026
作者: Weilun Feng, Mingqiang Wu, Zhiliang Chen, Chuanguang Yang, Haotong Qin, Yuqi Li, Xiaokun Liu, Guoxin Fan, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu
cs.AI
摘要
SAM3D技术虽能实现复杂场景的可扩展开放世界三维重建,但其部署受限于极高的推理延迟。本研究首次系统剖析其推理动态特性,发现通用加速策略在此场景下表现脆弱。我们证明这些失效源于对管道固有多层次异构性的忽视:形状与布局的运动学差异性、纹理优化的内在稀疏性,以及几何结构的频谱异质性。为此,我们提出Fast-SAM3D——一种免训练框架,通过动态计算与实时生成复杂度对齐实现加速。该框架集成三项异构感知机制:(1)模态感知步长缓存,将结构演化与敏感布局更新解耦;(2)联合时空令牌雕刻,聚焦高熵区域优化;(3)频谱感知令牌聚合,自适应调整解码分辨率。大量实验表明,Fast-SAM3D在保持可忽略保真度损失的同时,实现最高2.67倍的端到端加速,为高效单视图三维生成树立了新的帕累托前沿。代码已发布于https://github.com/wlfeng0509/Fast-SAM3D。
English
SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the first systematic investigation into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level heterogeneity: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present Fast-SAM3D, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) Modality-Aware Step Caching to decouple structural evolution from sensitive layout updates; (2) Joint Spatiotemporal Token Carving to concentrate refinement on high-entropy regions; and (3) Spectral-Aware Token Aggregation to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to 2.67times end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.