ChatPaper.aiChatPaper

生成式拼貼照片

Generative Photomontage

August 13, 2024
作者: Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu
cs.AI

摘要

文字轉圖像模型是圖像創建的強大工具。然而,生成過程類似擲骰子,使得難以實現一幅完全滿足用戶需求的圖像。本文提出了一個框架,通過從生成圖像的各個部分合成圖像,從而創建所需的圖像,本質上形成一個生成式拼貼。給定一堆由ControlNet生成的圖像,這些圖像使用相同的輸入條件和不同的種子,我們讓用戶使用筆劃界面從生成結果中選擇所需的部分。我們引入了一種新技術,該技術接受用戶的筆劃,使用基於擴散特徵空間的圖形優化對生成的圖像進行分割,然後通過一種新的特徵空間混合方法合成分割區域。我們的方法在合成時忠實地保留了用戶選擇的區域,使它們和諧地組合在一起。我們展示了我們靈活框架可用於許多應用,包括生成新的外觀組合,修復不正確的形狀和瑕疵,以及改善提示對齊。我們展示了每個應用的引人注目的結果,並證明我們的方法優於現有的圖像混合方法和各種基準。
English
Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.

Summary

AI-Generated Summary

PDF212November 28, 2024