具有佈局學習的解耦式3D場景生成

摘要

我們介紹了一種方法來生成被解開成其組成物件的3D場景。這種解開是無監督的，僅依賴於一個大型預訓練的文本到圖像模型的知識。我們的關鍵見解是，通過找到3D場景的部分，在空間上重新排列時仍然產生相同場景的有效配置，可以發現物件。具體來說，我們的方法從頭開始聯合優化多個NeRF模型 - 每個模型代表其自己的物件 - 以及一組將這些物件合成場景的佈局。然後，我們鼓勵這些合成的場景根據圖像生成器處於分佈中。我們展示了，儘管其簡單性，我們的方法成功生成了被分解為個別物件的3D場景，從而在文本到3D內容創作中實現了新的能力。有關結果和互動演示，請參見我們的項目頁面：https://dave.ml/layoutlearning/

English

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/

具有佈局學習的解耦式3D場景生成

Disentangled 3D Scene Generation with Layout Learning

摘要

Support