InseRF: 신경망 3D 장면에서의 텍스트 기반 생성적 객체 삽입

초록

우리는 3D 장면의 NeRF 재구성에서 생성적 객체 삽입을 위한 새로운 방법인 InseRF를 소개합니다. InseRF는 사용자가 제공한 텍스트 설명과 참조 시점에서의 2D 경계 상자를 기반으로 3D 장면에 새로운 객체를 생성합니다. 최근, 텍스트-이미지 확산 모델의 강력한 사전 지식을 3D 생성 모델링에 활용함으로써 3D 장면 편집 방법이 크게 발전했습니다. 기존 방법들은 주로 스타일과 외관 변경이나 기존 객체 제거를 통해 3D 장면을 편집하는 데 효과적이었습니다. 그러나 새로운 객체를 생성하는 것은 이러한 방법들에 있어 여전히 도전 과제로 남아 있으며, 본 연구에서 이를 해결하고자 합니다. 구체적으로, 우리는 3D 객체 삽입을 장면의 참조 뷰에서의 2D 객체 삽입에 기반을 두도록 제안합니다. 이 2D 편집은 단일 뷰 객체 재구성 방법을 사용하여 3D로 확장됩니다. 재구성된 객체는 단안 깊이 추정 방법의 사전 지식을 바탕으로 장면에 삽입됩니다. 우리는 다양한 3D 장면에서 이 방법을 평가하고 제안된 구성 요소에 대한 심층 분석을 제공합니다. 여러 3D 장면에서의 객체 생성적 삽입 실험은 기존 방법과 비교하여 우리 방법의 효과를 보여줍니다. InseRF는 명시적인 3D 정보를 입력으로 요구하지 않으면서도 제어 가능하고 3D 일관성 있는 객체 삽입이 가능합니다. 자세한 내용은 프로젝트 페이지(https://mohamad-shahbazi.github.io/inserf)를 방문해 주세요.

English

We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Recently, methods for 3D scene editing have been profoundly transformed, owing to the use of strong priors of text-to-image diffusion models in 3D generative modeling. Existing methods are mostly effective in editing 3D scenes via style and appearance changes or removing existing objects. Generating new objects, however, remains a challenge for such methods, which we address in this study. Specifically, we propose grounding the 3D object insertion to a 2D object insertion in a reference view of the scene. The 2D edit is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. We evaluate our method on various 3D scenes and provide an in-depth analysis of the proposed components. Our experiments with generative insertion of objects in several 3D scenes indicate the effectiveness of our method compared to the existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. Please visit our project page at https://mohamad-shahbazi.github.io/inserf.

InseRF: 신경망 3D 장면에서의 텍스트 기반 생성적 객체 삽입

InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

초록

Support