通过自动提示优化改善文本到图像的一致性
Improving Text-to-Image Consistency via Automatic Prompt Optimization
March 26, 2024
作者: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal
cs.AI
摘要
在文本到图像(T2I)生成模型方面取得了令人印象深刻的进展,产生了大量性能优异的模型,能够生成审美吸引人、逼真的图像。尽管取得了进展,这些模型仍然难以生成与输入提示一致的图像,往往无法正确捕捉对象数量、关系和属性。现有的改善提示-图像一致性的解决方案面临以下挑战:(1)它们往往需要对模型进行微调,(2)它们只关注附近的提示样本,(3)它们受到图像质量、表示多样性和提示-图像一致性之间不利的权衡影响。在本文中,我们解决了这些挑战,并引入了一个T2I提示优化框架OPT2I,利用大型语言模型(LLM)来提高T2I模型中的提示-图像一致性。我们的框架从用户提示开始,通过迭代生成修订提示,旨在最大化一致性得分。我们在两个数据集MSCOCO和PartiPrompts上进行了广泛验证,结果显示OPT2I可以将初始一致性得分提高高达24.9%,以DSG得分为指标,同时保持FID并增加生成数据与真实数据之间的召回率。我们的工作通过利用LLM的力量,为构建更可靠和稳健的T2I系统铺平了道路。
English
Impressive advances in text-to-image (T2I) generative models have yielded a
plethora of high performing models which are able to generate aesthetically
appealing, photorealistic images. Despite the progress, these models still
struggle to produce images that are consistent with the input prompt,
oftentimes failing to capture object quantities, relations and attributes
properly. Existing solutions to improve prompt-image consistency suffer from
the following challenges: (1) they oftentimes require model fine-tuning, (2)
they only focus on nearby prompt samples, and (3) they are affected by
unfavorable trade-offs among image quality, representation diversity, and
prompt-image consistency. In this paper, we address these challenges and
introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a
large language model (LLM) to improve prompt-image consistency in T2I models.
Our framework starts from a user prompt and iteratively generates revised
prompts with the goal of maximizing a consistency score. Our extensive
validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost
the initial consistency score by up to 24.9% in terms of DSG score while
preserving the FID and increasing the recall between generated and real data.
Our work paves the way toward building more reliable and robust T2I systems by
harnessing the power of LLMs.Summary
AI-Generated Summary