SViM3D:基于单张图像的稳定视频材质扩散三维生成
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
October 9, 2025
作者: Andreas Engelhardt, Mark Boss, Vikram Voletti, Chun-Han Yao, Hendrik P. A. Lensch, Varun Jampani
cs.AI
摘要
我们提出了稳定视频材质三维重建框架(SViM3D),该框架能够基于单张图像预测多视角一致的基于物理的渲染(PBR)材质。近期,视频扩散模型已成功应用于从单张图像高效重建三维物体。然而,反射特性仍由简单的材质模型表示,或需通过额外步骤估算,以实现重光照和可控的外观编辑。我们扩展了一种潜在视频扩散模型,使其在基于明确相机控制生成每一视角时,能同时输出空间变化的PBR参数和表面法线。这一独特配置允许利用我们的模型作为神经先验进行重光照及生成三维资产。我们为此流程引入了多种机制,以提升在这一不适定场景下的质量。我们在多个以物体为中心的数据集上展示了领先的重光照和新视角合成性能。我们的方法能泛化至多样化的输入,支持生成适用于增强现实/虚拟现实(AR/VR)、电影、游戏及其他视觉媒体的可重光照三维资产。
English
We present Stable Video Materials 3D (SViM3D), a framework to predict
multi-view consistent physically based rendering (PBR) materials, given a
single image. Recently, video diffusion models have been successfully used to
reconstruct 3D objects from a single image efficiently. However, reflectance is
still represented by simple material models or needs to be estimated in
additional steps to enable relighting and controlled appearance edits. We
extend a latent video diffusion model to output spatially varying PBR
parameters and surface normals jointly with each generated view based on
explicit camera control. This unique setup allows for relighting and generating
a 3D asset using our model as neural prior. We introduce various mechanisms to
this pipeline that improve quality in this ill-posed setting. We show
state-of-the-art relighting and novel view synthesis performance on multiple
object-centric datasets. Our method generalizes to diverse inputs, enabling the
generation of relightable 3D assets useful in AR/VR, movies, games and other
visual media.