ChatPaper.aiChatPaper

重要性体现:准确数量物体的文本到图像生成

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

June 14, 2024
作者: Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik
cs.AI

摘要

尽管文本到图像扩散模型取得了前所未有的成功,但通过文本控制所描绘对象的数量却异常困难。这对于从技术文档到儿童书籍再到烹饪食谱的各种应用都至关重要。生成正确的对象计数在根本上是具有挑战性的,因为生成模型需要为每个对象实例保持独立身份感,即使有几个对象看起来相同或重叠,并在生成过程中隐式进行全局计算。目前尚不清楚是否存在这样的表示。为了解决正确计数生成的问题,我们首先确定扩散模型中可以携带对象身份信息的特征。然后在去噪过程中使用这些特征来分离和计数对象实例,并检测过度生成和不足生成。我们通过训练一个模型来修复后者,该模型基于现有对象的布局预测缺失对象的形状和位置,并展示了如何利用它来引导具有正确对象计数的去噪过程。我们的方法 CountGen 不依赖外部来源来确定对象布局,而是使用扩散模型本身的先验,创建了依赖于提示和种子的布局。在两个基准数据集上评估,我们发现 CountGen 明显优于现有基线的计数准确性。
English
Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify features within the diffusion model that can carry the object identity information. We then use them to separate and count instances of objects during the denoising process and detect over-generation and under-generation. We fix the latter by training a model that predicts both the shape and location of a missing object, based on the layout of existing ones, and show how it can be used to guide denoising with correct object count. Our approach, CountGen, does not depend on external source to determine object layout, but rather uses the prior from the diffusion model itself, creating prompt-dependent and seed-dependent layouts. Evaluated on two benchmark datasets, we find that CountGen strongly outperforms the count-accuracy of existing baselines.

Summary

AI-Generated Summary

PDF783December 6, 2024