NeRFiller: 生成的3Dインペインティングによるシーン補完

要旨

我々はNeRFillerを提案する。これは、既存の2D視覚生成モデルを用いて、3Dキャプチャの欠損部分を生成的な3Dインペインティングで補完するアプローチである。3Dシーンやオブジェクトの一部は、メッシュ再構成の失敗や観測不足（例えば、物体の底面などの接触領域や到達困難な領域）によって欠落していることが多い。我々は、この難しい3Dインペインティング問題に取り組むために、2Dインペインティング拡散モデルを活用する。これらのモデルが、画像が2×2グリッドを形成する場合により3D整合性のあるインペイントを生成するという驚くべき挙動を発見し、この挙動を4枚以上の画像に一般化する方法を示す。次に、これらのインペイント領域を単一の整合性のある3Dシーンに蒸留する反復的フレームワークを提示する。関連研究とは対照的に、我々は前景オブジェクトを削除するのではなくシーンを補完することに焦点を当てており、厳密な2Dオブジェクトマスクやテキストを必要としない。我々のアプローチを、様々なシーンにおいて設定に適応させた関連ベースラインと比較し、NeRFillerが最も3D整合性があり、妥当なシーン補完を作成することを示す。プロジェクトページはhttps://ethanweber.me/nerfillerにある。

English

We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2times2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller.

NeRFiller: 生成的3Dインペインティングによるシーン補完

NeRFiller: Completing Scenes via Generative 3D Inpainting

要旨

Support