InseRF: ニューラル3Dシーンにおけるテキスト駆動型生成オブジェクト挿入

要旨

本論文では、3DシーンのNeRF再構成における生成的なオブジェクト挿入のための新手法InseRFを紹介する。InseRFは、ユーザーが提供するテキスト記述と参照視点における2Dバウンディングボックスに基づいて、3Dシーンに新しいオブジェクトを生成する。最近、テキストから画像への拡散モデルの強力な事前知識を3D生成モデリングに活用することで、3Dシーン編集の手法が大きく進化している。既存の手法は、スタイルや外観の変更や既存オブジェクトの削除による3Dシーン編集に有効であるが、新しいオブジェクトの生成は依然として課題となっており、本研究ではこの問題に取り組む。具体的には、3Dオブジェクト挿入をシーンの参照視点における2Dオブジェクト挿入に基づいて行うことを提案する。2D編集は、単一視点オブジェクト再構成手法を用いて3Dに変換される。再構成されたオブジェクトは、単眼深度推定手法の事前知識に基づいてシーンに挿入される。我々は、様々な3Dシーンにおいて本手法を評価し、提案するコンポーネントの詳細な分析を提供する。複数の3Dシーンにおけるオブジェクトの生成的挿入に関する実験結果は、既存手法と比較して本手法の有効性を示している。InseRFは、明示的な3D情報を入力として必要とせず、制御可能で3D整合性のあるオブジェクト挿入を実現する。詳細はプロジェクトページ（https://mohamad-shahbazi.github.io/inserf）を参照されたい。

English

We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Recently, methods for 3D scene editing have been profoundly transformed, owing to the use of strong priors of text-to-image diffusion models in 3D generative modeling. Existing methods are mostly effective in editing 3D scenes via style and appearance changes or removing existing objects. Generating new objects, however, remains a challenge for such methods, which we address in this study. Specifically, we propose grounding the 3D object insertion to a 2D object insertion in a reference view of the scene. The 2D edit is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. We evaluate our method on various 3D scenes and provide an in-depth analysis of the proposed components. Our experiments with generative insertion of objects in several 3D scenes indicate the effectiveness of our method compared to the existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. Please visit our project page at https://mohamad-shahbazi.github.io/inserf.

InseRF: ニューラル3Dシーンにおけるテキスト駆動型生成オブジェクト挿入

InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

要旨

Support