生成式积木世界：图像中的物体重组

摘要

我们提出生成积木世界，通过操控简单的几何抽象体来与生成图像中的场景进行交互。我们的方法将场景表示为凸面三维基元的组合，同一场景可由不同数量的基元构成，使编辑者既能移动整体结构，也能调整细微之处。场景几何编辑完成后，图像通过一种基于流的方法生成，该方法以深度和纹理提示为条件。我们的纹理提示考虑了修改后的三维基元，超越了现有键值缓存技术提供的纹理一致性。这些纹理提示（a）支持精确的对象和相机移动，（b）在很大程度上保留了所描绘对象的身份。定量与定性实验表明，我们的方法在视觉保真度、可编辑性及组合泛化能力上均优于先前工作。

English

We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives, and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method which is conditioned on depth and a texture hint. Our texture hint takes into account the modified 3D primitives, exceeding texture-consistency provided by existing key-value caching techniques. These texture hints (a) allow accurate object and camera moves and (b) largely preserve the identity of objects depicted. Quantitative and qualitative experiments demonstrate that our approach outperforms prior works in visual fidelity, editability, and compositional generalization.

生成式积木世界：图像中的物体重组

Generative Blocks World: Moving Things Around in Pictures

摘要

Support