构建场景:用于基于扩散的图像生成的交互式3D布局控制
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
August 27, 2024
作者: Abdelrahman Eldesokey, Peter Wonka
cs.AI
摘要
我们提出了一种基于扩散的文本到图像(T2I)生成方法,具有交互式3D布局控制。布局控制已被广泛研究,以缓解T2I扩散模型在理解对象从文本描述中的放置和关系方面的缺点。然而,现有的布局控制方法仅限于2D布局,需要用户事先提供静态布局,并且无法在布局更改时保留生成的图像。这使得这些方法不适用于需要3D对象控制和迭代细化的应用,例如室内设计和复杂场景生成。为此,我们利用了最近在深度条件T2I模型方面的进展,并提出了一种新颖的交互式3D布局控制方法。我们将布局控制中传统的2D框替换为3D框。此外,我们将T2I任务重新构建为多阶段生成过程,在每个阶段,用户可以在3D中插入、更改和移动对象,同时保留之前阶段的对象。我们通过提出的动态自注意(DSA)模块和一致的3D对象平移策略实现了这一点。实验证明,我们的方法可以基于3D布局生成复杂场景,将对象生成成功率提高了2倍以上,超过了标准深度条件T2I方法。此外,与其他方法相比,在布局更改时保留对象方面表现更优。项目页面:https://abdo-eldesokey.github.io/build-a-scene/
English
We propose a diffusion-based approach for Text-to-Image (T2I) generation with
interactive 3D layout control. Layout control has been widely studied to
alleviate the shortcomings of T2I diffusion models in understanding objects'
placement and relationships from text descriptions. Nevertheless, existing
approaches for layout control are limited to 2D layouts, require the user to
provide a static layout beforehand, and fail to preserve generated images under
layout changes. This makes these approaches unsuitable for applications that
require 3D object-wise control and iterative refinements, e.g., interior design
and complex scene generation. To this end, we leverage the recent advancements
in depth-conditioned T2I models and propose a novel approach for interactive 3D
layout control. We replace the traditional 2D boxes used in layout control with
3D boxes. Furthermore, we revamp the T2I task as a multi-stage generation
process, where at each stage, the user can insert, change, and move an object
in 3D while preserving objects from earlier stages. We achieve this through our
proposed Dynamic Self-Attention (DSA) module and the consistent 3D object
translation strategy. Experiments show that our approach can generate
complicated scenes based on 3D layouts, boosting the object generation success
rate over the standard depth-conditioned T2I methods by 2x. Moreover, it
outperforms other methods in comparison in preserving objects under layout
changes. Project Page: https://abdo-eldesokey.github.io/build-a-scene/Summary
AI-Generated Summary