생성적 블록 세계: 그림 속 사물 이동하기

초록

우리는 단순한 기하학적 추상화를 조작하여 생성된 이미지의 장면과 상호작용할 수 있는 Generative Blocks World를 소개한다. 우리의 방법은 장면을 볼록한 3D 기본 요소들의 조합으로 표현하며, 동일한 장면을 다양한 수의 기본 요소로 표현할 수 있어 편집자가 전체 구조나 작은 세부 사항을 모두 이동시킬 수 있도록 한다. 장면의 기하학적 구조가 편집된 후, 깊이와 텍스처 힌트에 기반한 플로우 기반 방법으로 이미지를 생성한다. 우리의 텍스처 힌트는 수정된 3D 기본 요소를 고려하여, 기존의 키-값 캐싱 기법이 제공하는 텍스처 일관성을 뛰어넘는다. 이러한 텍스처 힌트는 (a) 정확한 객체 및 카메라 이동을 가능하게 하고, (b) 묘사된 객체의 정체성을 크게 보존한다. 정량적 및 정성적 실험을 통해 우리의 접근 방식이 시각적 충실도, 편집 가능성, 구성적 일반화 측면에서 기존 연구를 능가함을 입증한다.

English

We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives, and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method which is conditioned on depth and a texture hint. Our texture hint takes into account the modified 3D primitives, exceeding texture-consistency provided by existing key-value caching techniques. These texture hints (a) allow accurate object and camera moves and (b) largely preserve the identity of objects depicted. Quantitative and qualitative experiments demonstrate that our approach outperforms prior works in visual fidelity, editability, and compositional generalization.