生成式積木世界:圖像中的物體重排
Generative Blocks World: Moving Things Around in Pictures
June 25, 2025
作者: Vaibhav Vavilala, Seemandhar Jain, Rahul Vasanth, D. A. Forsyth, Anand Bhattad
cs.AI
摘要
我們提出了生成積木世界,通過操控簡單的幾何抽象來與生成圖像的場景進行互動。我們的方法將場景表示為凸面三維基元的組合,同一場景可以用不同數量的基元來表示,這使得編輯者能夠移動整個結構或細微細節。一旦場景幾何被編輯,圖像便通過一種基於流的方法生成,該方法受深度和紋理提示的條件約束。我們的紋理提示考慮了修改後的三維基元,超越了現有鍵值緩存技術所提供的紋理一致性。這些紋理提示(a)允許精確的物體和相機移動,以及(b)在很大程度上保留了所描繪物體的身份。定量和定性實驗表明,我們的方法在視覺保真度、可編輯性和組合泛化方面優於先前的工作。
English
We describe Generative Blocks World to interact with the scene of a generated
image by manipulating simple geometric abstractions. Our method represents
scenes as assemblies of convex 3D primitives, and the same scene can be
represented by different numbers of primitives, allowing an editor to move
either whole structures or small details. Once the scene geometry has been
edited, the image is generated by a flow-based method which is conditioned on
depth and a texture hint. Our texture hint takes into account the modified 3D
primitives, exceeding texture-consistency provided by existing key-value
caching techniques. These texture hints (a) allow accurate object and camera
moves and (b) largely preserve the identity of objects depicted. Quantitative
and qualitative experiments demonstrate that our approach outperforms prior
works in visual fidelity, editability, and compositional generalization.