SViM3D: 단일 이미지 3D 생성을 위한 안정적인 비디오 물질 확산

초록

우리는 단일 이미지를 입력으로 받아 다중 뷰 일관성을 갖는 물리 기반 렌더링(PBR) 재질을 예측하는 프레임워크인 Stable Video Materials 3D(SViM3D)를 제안합니다. 최근 비디오 확산 모델(diffusion model)을 사용하여 단일 이미지로부터 3D 객체를 효율적으로 재구성하는 데 성공했지만, 반사율은 여전히 단순한 재질 모델로 표현되거나 재조명 및 외관 제어 편집을 가능하게 하기 위해 추가 단계에서 추정되어야 합니다. 우리는 잠재 비디오 확산 모델을 확장하여 명시적 카메라 제어를 기반으로 생성된 각 뷰와 함께 공간적으로 변화하는 PBR 매개변수와 표면 법선을 동시에 출력하도록 합니다. 이 독특한 설정은 우리의 모델을 신경망 사전(neural prior)으로 사용하여 3D 자산을 생성하고 재조명할 수 있게 합니다. 우리는 이 잘 정의되지 않은 설정에서 품질을 향상시키는 다양한 메커니즘을 이 파이프라인에 도입합니다. 여러 객체 중심 데이터셋에서 최첨단 재조명 및 새로운 뷰 합성 성능을 보여줍니다. 우리의 방법은 다양한 입력에 일반화되어 AR/VR, 영화, 게임 및 기타 시각 매체에서 유용한 재조명 가능한 3D 자산을 생성할 수 있습니다.

English

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control. This unique setup allows for relighting and generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

SViM3D: 단일 이미지 3D 생성을 위한 안정적인 비디오 물질 확산

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

초록

Support