RealmDreamer: インペインティングと深度拡散を活用したテキスト駆動型3Dシーン生成

要旨

本論文では、テキスト記述から一般的な正面視点の3Dシーンを生成する技術であるRealmDreamerを紹介する。本技術は、複雑なテキストプロンプトに一致するように3Dガウススプラッティング表現を最適化する。これらのスプラットを初期化するために、最先端のテキストから画像への生成器を利用し、そのサンプルを3Dにリフトしてオクルージョン体積を計算する。次に、この表現を画像条件付き拡散モデルを用いた3Dインペインティングタスクとして複数の視点で最適化する。正しい幾何学的構造を学習するために、インペインティングモデルからのサンプルを条件として深度拡散モデルを組み込み、豊かな幾何学的構造を提供する。最後に、画像生成器からのシャープ化されたサンプルを使用してモデルを微調整する。特に、本技術はビデオや多視点データを必要とせず、複数のオブジェクトからなる様々なスタイルの高品質な3Dシーンを合成できる。その汎用性により、単一の画像からの3D合成も可能である。

English

We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.

RealmDreamer: インペインティングと深度拡散を活用したテキスト駆動型3Dシーン生成

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

要旨

Support