ShowRoom3D: 3D 사전 지식을 활용한 텍스트에서 고품질 3D 방 생성

초록

우리는 텍스트로부터 고품질의 3D 방 규모 장면을 생성하기 위한 3단계 접근법인 ShowRoom3D를 소개합니다. 이전의 방법들은 2D 확산 사전 지식을 사용하여 뉴럴 레디언스 필드(NeRF)를 최적화하여 방 규모 장면을 생성했지만, 만족스럽지 못한 품질을 보였습니다. 이는 주로 3D 인식이 부족한 2D 사전 지식의 한계와 훈련 방법론의 제약 때문입니다. 본 논문에서는 3D 확산 사전 지식인 MVDiffusion을 활용하여 3D 방 규모 장면을 최적화합니다. 우리의 기여는 두 가지 측면에 있습니다. 첫째, NeRF를 최적화하기 위해 점진적인 뷰 선택 과정을 제안합니다. 이는 훈련 과정을 세 단계로 나누고, 점차적으로 카메라 샘플링 범위를 확장하는 것을 포함합니다. 둘째, 두 번째 단계에서 포즈 변환 방법을 제안합니다. 이는 MVDiffusion이 정확한 뷰 지도를 제공하도록 보장할 것입니다. 결과적으로, ShowRoom3D는 구조적 무결성이 개선되고, 모든 뷰에서 선명도가 향상되며, 콘텐츠 반복이 줄어들고, 다양한 관점 간의 일관성이 높은 방을 생성할 수 있게 합니다. 광범위한 실험을 통해 우리의 방법이 사용자 연구 측면에서 최신 접근법들을 큰 차이로 능가함을 입증했습니다.

English

We introduce ShowRoom3D, a three-stage approach for generating high-quality 3D room-scale scenes from texts. Previous methods using 2D diffusion priors to optimize neural radiance fields for generating room-scale scenes have shown unsatisfactory quality. This is primarily attributed to the limitations of 2D priors lacking 3D awareness and constraints in the training methodology. In this paper, we utilize a 3D diffusion prior, MVDiffusion, to optimize the 3D room-scale scene. Our contributions are in two aspects. Firstly, we propose a progressive view selection process to optimize NeRF. This involves dividing the training process into three stages, gradually expanding the camera sampling scope. Secondly, we propose the pose transformation method in the second stage. It will ensure MVDiffusion provide the accurate view guidance. As a result, ShowRoom3D enables the generation of rooms with improved structural integrity, enhanced clarity from any view, reduced content repetition, and higher consistency across different perspectives. Extensive experiments demonstrate that our method, significantly outperforms state-of-the-art approaches by a large margin in terms of user study.

ShowRoom3D: 3D 사전 지식을 활용한 텍스트에서 고품질 3D 방 생성

ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors

초록

Support