RealmDreamer：使用修補和深度擴散的文本驅動3D場景生成

摘要

我們介紹了RealmDreamer，一種從文字描述生成通用前向3D場景的技術。我們的技術優化了一個3D高斯Splatting表示，以匹配複雜的文字提示。我們通過利用最先進的文本到圖像生成器來初始化這些splat，將它們提升到3D，並計算遮蔽體積。然後，我們將這種表示優化到多個視角，作為一個帶有圖像條件擴散模型的3D修補任務。為了學習正確的幾何結構，我們通過在修補模型的樣本上進行條件化，將深度擴散模型納入其中，提供豐富的幾何結構。最後，我們使用來自圖像生成器的銳化樣本對模型進行微調。值得注意的是，我們的技術不需要視頻或多視圖數據，可以合成各種風格的高質量3D場景，包括多個物體。其通用性還允許從單張圖像進行3D合成。

English

We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.

RealmDreamer：使用修補和深度擴散的文本驅動3D場景生成

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

摘要

Support