NeRFiller: 생성적 3D 인페인팅을 통한 장면 완성

초록

우리는 기존의 2D 시각 생성 모델을 활용하여 3D 캡처에서 누락된 부분을 생성적 3D 인페인팅으로 완성하는 NeRFiller 접근법을 제안합니다. 종종 3D 장면이나 객체의 일부는 메쉬 재구성 실패나 관측 부족(예: 물체의 바닥과 같은 접촉 영역이나 접근하기 어려운 부분)으로 인해 누락됩니다. 우리는 이 어려운 3D 인페인팅 문제를 해결하기 위해 2D 인페인팅 확산 모델을 활용합니다. 우리는 이러한 모델이 이미지가 2x2 그리드를 형성할 때 더 3D 일관된 인페인팅을 생성한다는 놀라운 특성을 발견했으며, 이를 네 개 이상의 이미지로 일반화하는 방법을 보여줍니다. 그런 다음, 이러한 인페인팅된 영역을 단일 일관된 3D 장면으로 정제하는 반복적 프레임워크를 제시합니다. 관련 연구들과 달리, 우리는 전경 객체를 삭제하는 대신 장면을 완성하는 데 초점을 맞추며, 우리의 접근법은 엄격한 2D 객체 마스크나 텍스트를 필요로 하지 않습니다. 우리는 다양한 장면에서 우리의 설정에 맞게 조정된 관련 베이스라인과 우리의 접근법을 비교하며, NeRFiller가 가장 3D 일관되고 그럴듯한 장면 완성을 만들어냄을 보여줍니다. 우리의 프로젝트 페이지는 https://ethanweber.me/nerfiller에서 확인할 수 있습니다.

English

We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2times2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller.

NeRFiller: 생성적 3D 인페인팅을 통한 장면 완성

NeRFiller: Completing Scenes via Generative 3D Inpainting

초록

Support