文本到图像扩散模型的迭代对象计数优化
Iterative Object Count Optimization for Text-to-image Diffusion Models
August 21, 2024
作者: Oz Zafar, Lior Wolf, Idan Schwartz
cs.AI
摘要
我们解决了文本到图像模型中的一个持久性挑战:准确生成指定数量的对象。当前的模型从图像文本对中学习,在计数方面存在困难,因为训练数据无法展示任何给定对象的所有可能数量。为了解决这个问题,我们提出基于计数模型导出的计数损失对生成的图像进行优化,该计数模型聚合了对象的潜力。使用现成的计数模型具有挑战性,原因有两点:首先,该模型需要一个用于潜力聚合的缩放超参数,这个超参数会根据对象的视角而变化;其次,分类器指导技术需要修改的模型,这些模型在嘈杂的中间扩散步骤上运行。为了解决这些挑战,我们提出了一个迭代的在线训练模式,可以改善推断图像的准确性,同时改变文本调节嵌入并动态调整超参数。我们的方法提供了三个关键优势:(i) 它可以考虑基于检测模型的非可导计数技术,(ii) 它是一种零-shot即插即用的解决方案,便于快速更改计数技术和图像生成方法,(iii) 优化的计数令牌可以被重复使用以生成准确的图像,无需额外优化。我们评估了各种对象的生成,并展示了准确性的显著提高。项目页面位于https://ozzafar.github.io/count_token。
English
We address a persistent challenge in text-to-image models: accurately
generating a specified number of objects. Current models, which learn from
image-text pairs, inherently struggle with counting, as training data cannot
depict every possible number of objects for any given object. To solve this, we
propose optimizing the generated image based on a counting loss derived from a
counting model that aggregates an object\'s potential. Employing an
out-of-the-box counting model is challenging for two reasons: first, the model
requires a scaling hyperparameter for the potential aggregation that varies
depending on the viewpoint of the objects, and second, classifier guidance
techniques require modified models that operate on noisy intermediate diffusion
steps. To address these challenges, we propose an iterated online training mode
that improves the accuracy of inferred images while altering the text
conditioning embedding and dynamically adjusting hyperparameters. Our method
offers three key advantages: (i) it can consider non-derivable counting
techniques based on detection models, (ii) it is a zero-shot plug-and-play
solution facilitating rapid changes to the counting techniques and image
generation methods, and (iii) the optimized counting token can be reused to
generate accurate images without additional optimization. We evaluate the
generation of various objects and show significant improvements in accuracy.
The project page is available at https://ozzafar.github.io/count_token.Summary
AI-Generated Summary