ChatPaper.aiChatPaper

精準計數:具有準確物件數量的文本生成圖像

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

June 14, 2024
作者: Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik
cs.AI

摘要

儘管文本到圖像擴散模型取得了前所未有的成功,但使用文本控制所描繪物件的數量卻出奇地困難。這對於從技術文件到兒童書籍再到烹飪食譜的各種應用都很重要。生成正確的物件計數在根本上是具有挑戰性的,因為生成模型需要保持對每個物件實例的獨立身份感,即使有些物件看起來相同或重疊,然後在生成過程中隱含地進行全局計算。目前還不清楚這樣的表示是否存在。為了解決正確計數的生成問題,我們首先識別了擴散模型中能夠攜帶物件身份信息的特徵。然後在去噪過程中使用這些特徵來分離和計算物件的實例,並檢測過度生成和不足生成。我們通過訓練一個模型來修復後者,該模型基於現有物件的佈局預測缺失物件的形狀和位置,並展示了如何使用它來引導帶有正確物件計數的去噪。我們的方法 CountGen 不依賴外部來源來確定物件佈局,而是使用擴散模型本身的先驗,創建了取決於提示和種子的佈局。在兩個基準數據集上評估,我們發現 CountGen 在計數準確性方面明顯優於現有基準的表現。
English
Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify features within the diffusion model that can carry the object identity information. We then use them to separate and count instances of objects during the denoising process and detect over-generation and under-generation. We fix the latter by training a model that predicts both the shape and location of a missing object, based on the layout of existing ones, and show how it can be used to guide denoising with correct object count. Our approach, CountGen, does not depend on external source to determine object layout, but rather uses the prior from the diffusion model itself, creating prompt-dependent and seed-dependent layouts. Evaluated on two benchmark datasets, we find that CountGen strongly outperforms the count-accuracy of existing baselines.

Summary

AI-Generated Summary

PDF783December 6, 2024