带有布局学习的解缠的三维场景生成

摘要

我们介绍了一种生成三维场景并将其解缠为各个组件对象的方法。这种解缠是无监督的，仅依赖于一个大型预训练的文本到图像模型的知识。我们的关键洞察是，通过重新空间排列时仍能产生同一场景的有效配置的三维场景部分可以发现对象。具体来说，我们的方法从头开始联合优化多个NeRF模型 - 每个模型代表一个对象 - 以及一组将这些对象合成场景的布局。然后，我们鼓励这些合成场景根据图像生成器处于分布中。我们展示了，尽管方法简单，但成功生成了分解为各个对象的三维场景，为文本到三维内容创作带来了新的能力。有关结果和交互式演示，请访问我们的项目页面：https://dave.ml/layoutlearning/

English

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/

带有布局学习的解缠的三维场景生成

Disentangled 3D Scene Generation with Layout Learning

摘要

Support