SViM3D: 単一画像からの3D生成のための安定したビデオマテリアル拡散

要旨

本論文では、単一画像から多視点整合性のある物理ベースレンダリング（PBR）マテリアルを予測するフレームワーク、Stable Video Materials 3D（SViM3D）を提案します。最近、ビデオ拡散モデルを用いて単一画像から効率的に3Dオブジェクトを再構築することが成功しています。しかし、反射特性は依然として単純なマテリアルモデルで表現されるか、リライティングや制御された外観編集を可能にするために追加のステップで推定する必要があります。我々は、潜在的なビデオ拡散モデルを拡張し、明示的なカメラ制御に基づいて生成された各ビューと共に、空間的に変化するPBRパラメータと表面法線を出力します。このユニークな設定により、リライティングや3Dアセットの生成を我々のモデルをニューラル事前分布として使用して行うことが可能になります。このパイプラインに、この不適切設定における品質を向上させるための様々なメカニズムを導入します。複数のオブジェクト中心データセットにおいて、最先端のリライティングおよび新規ビュー合成性能を示します。我々の手法は多様な入力に一般化し、AR/VR、映画、ゲーム、その他の視覚メディアで有用なリライト可能な3Dアセットの生成を可能にします。

English

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control. This unique setup allows for relighting and generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

SViM3D: 単一画像からの3D生成のための安定したビデオマテリアル拡散

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

要旨

Support