V3D: 비디오 확산 모델은 효과적인 3D 생성기입니다

초록

최근 자동 3D 생성 기술이 폭넓은 관심을 받고 있습니다. 최신 방법들은 생성 속도를 크게 향상시켰지만, 모델 용량이나 3D 데이터의 한계로 인해 일반적으로 덜 세부적인 객체를 생성합니다. 비디오 확산 모델(video diffusion models)의 최근 발전에 영감을 받아, 우리는 사전 학습된 비디오 확산 모델의 세계 시뮬레이션 능력을 활용하여 3D 생성을 촉진하는 V3D를 소개합니다. 비디오 확산 모델이 3D 세계를 인지할 수 있는 잠재력을 최대한 발휘하기 위해, 우리는 기하학적 일관성 사전(geometrical consistency prior)을 도입하고 비디오 확산 모델을 다중 뷰 일관성(multi-view consistent) 3D 생성기로 확장합니다. 이를 통해 최첨단 비디오 확산 모델을 미세 조정하여 단일 이미지를 기반으로 객체를 둘러싼 360도 궤도 프레임을 생성할 수 있습니다. 우리가 맞춤화한 재구성 파이프라인을 사용하면 3분 이내에 고품질 메시(mesh) 또는 3D 가우시안(3D Gaussians)을 생성할 수 있습니다. 또한, 우리의 방법은 희소 입력 뷰(sparse input views)로 카메라 경로를 정밀하게 제어하며 장면 수준의 새로운 뷰 합성(scene-level novel view synthesis)으로 확장될 수 있습니다. 광범위한 실험을 통해 제안된 접근 방식이 특히 생성 품질과 다중 뷰 일관성 측면에서 우수한 성능을 보임을 입증했습니다. 우리의 코드는 https://github.com/heheyas/V3D에서 확인할 수 있습니다.

English

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency. Our code is available at https://github.com/heheyas/V3D

V3D: 비디오 확산 모델은 효과적인 3D 생성기입니다

V3D: Video Diffusion Models are Effective 3D Generators

초록

Support