NeRF 맞춤 설정: 로컬-글로벌 반복 학습을 통한 적응형 소스 기반 3D 장면 편집

초록

본 논문에서는 텍스트 설명 또는 참조 이미지를 편집 프롬프트로 통합하는 CustomNeRF 모델을 제안하여 적응형 소스 기반 3D 장면 편집 작업을 목표로 한다. 그러나 편집 프롬프트에 부합하는 원하는 편집 결과를 얻는 것은 두 가지 주요 과제, 즉 전경 영역만을 정확하게 편집하는 것과 단일 뷰 참조 이미지가 주어졌을 때의 다중 뷰 일관성 문제로 인해 쉽지 않다. 첫 번째 과제를 해결하기 위해, 전경 영역 편집과 전체 이미지 편집을 번갈아 수행하는 Local-Global Iterative Editing (LGIE) 훈련 기법을 제안하여 배경을 보존하면서 전경만을 조작하는 것을 목표로 한다. 두 번째 과제를 위해, 생성 모델 내의 클래스 사전 정보를 활용하여 이미지 기반 편집에서 서로 다른 뷰 간의 불일치 문제를 완화하는 클래스 가이드 정규화 기법을 설계한다. 다양한 실제 장면에서 텍스트 및 이미지 기반 설정 모두에 대해 CustomNeRF가 정확한 편집 결과를 생성함을 광범위한 실험을 통해 입증한다.

English

In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency given a single-view reference image. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing, aimed at foreground-only manipulation while preserving the background. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem among different views in image-driven editing. Extensive experiments show that our CustomNeRF produces precise editing results under various real scenes for both text- and image-driven settings.

NeRF 맞춤 설정: 로컬-글로벌 반복 학습을 통한 적응형 소스 기반 3D 장면 편집

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

초록

Support