SViM3D：基於單張圖像的穩定視頻材料擴散三維生成

摘要

我們提出了穩定視頻材質3D（SViM3D）框架，該框架能夠基於單張圖像預測多視角一致的物理基於渲染（PBR）材質。近年來，視頻擴散模型已成功用於從單張圖像高效重建3D物體。然而，反射特性仍由簡單的材質模型表示，或需通過額外步驟估算，以實現重新打光及可控的外觀編輯。我們擴展了一種潛在視頻擴散模型，使其能夠基於顯式相機控制，聯合輸出空間變化的PBR參數和表面法線與每個生成的視圖。這一獨特設置允許使用我們的模型作為神經先驗進行重新打光並生成3D資產。我們在這一不適定設置中引入了多種機制來提升質量。我們在多個以物體為中心的數據集上展示了最先進的重新打光和新視角合成性能。我們的方法能夠泛化到多樣化的輸入，從而生成可用於AR/VR、電影、遊戲及其他視覺媒體的可重新打光的3D資產。

English

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control. This unique setup allows for relighting and generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

SViM3D：基於單張圖像的穩定視頻材料擴散三維生成

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

摘要

Support