ChatPaper.aiChatPaper

適用於文本到圖像擴散模型的迭代式物件計數優化

Iterative Object Count Optimization for Text-to-image Diffusion Models

August 21, 2024
作者: Oz Zafar, Lior Wolf, Idan Schwartz
cs.AI

摘要

我們解決了文本到圖像模型中的一個持久挑戰:準確生成指定數量的物件。目前的模型,從圖像-文本對中學習,固有地在計數方面遇到困難,因為訓練數據無法呈現任何特定物件的所有可能數量。為了解決這個問題,我們提出基於計數模型導出的計數損失對生成的圖像進行優化。使用開箱即用的計數模型具有兩個挑戰性原因:首先,該模型需要一個用於潛在聚合的比例超參數,這取決於物件的視角,其次,分類器引導技術需要修改的模型,這些模型在噪聲干擾步驟上運作。為了應對這些挑戰,我們提出了一種迭代的在線訓練模式,通過改變文本條件嵌入和動態調整超參數來提高推斷圖像的準確性。我們的方法具有三個關鍵優勢:(i) 它可以考慮基於檢測模型的非可導計數技術,(ii) 它是一種零樣本即插即用的解決方案,有助於快速更改計數技術和圖像生成方法,以及(iii) 優化的計數標記可以被重複使用以生成準確的圖像,無需額外優化。我們評估了各種物件的生成並展示了準確性的顯著改進。項目頁面可在 https://ozzafar.github.io/count_token 找到。
English
We address a persistent challenge in text-to-image models: accurately generating a specified number of objects. Current models, which learn from image-text pairs, inherently struggle with counting, as training data cannot depict every possible number of objects for any given object. To solve this, we propose optimizing the generated image based on a counting loss derived from a counting model that aggregates an object\'s potential. Employing an out-of-the-box counting model is challenging for two reasons: first, the model requires a scaling hyperparameter for the potential aggregation that varies depending on the viewpoint of the objects, and second, classifier guidance techniques require modified models that operate on noisy intermediate diffusion steps. To address these challenges, we propose an iterated online training mode that improves the accuracy of inferred images while altering the text conditioning embedding and dynamically adjusting hyperparameters. Our method offers three key advantages: (i) it can consider non-derivable counting techniques based on detection models, (ii) it is a zero-shot plug-and-play solution facilitating rapid changes to the counting techniques and image generation methods, and (iii) the optimized counting token can be reused to generate accurate images without additional optimization. We evaluate the generation of various objects and show significant improvements in accuracy. The project page is available at https://ozzafar.github.io/count_token.

Summary

AI-Generated Summary

PDF62November 16, 2024