ChatPaper.aiChatPaper

通過自動提示優化來提高文本到圖像的一致性

Improving Text-to-Image Consistency via Automatic Prompt Optimization

March 26, 2024
作者: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal
cs.AI

摘要

在文本到圖像(T2I)生成模型方面取得了令人印象深刻的進展,產生了大量高性能模型,能夠生成具有美學吸引力且逼真的圖像。儘管取得了進步,這些模型仍然難以生成與輸入提示一致的圖像,往往無法正確捕捉對象的數量、關係和屬性。現有的解決方案旨在改善提示-圖像一致性,但面臨以下挑戰:(1)通常需要對模型進行微調,(2)僅關注附近的提示樣本,(3)在圖像質量、表示多樣性和提示-圖像一致性之間存在不利的權衡。在本文中,我們解決了這些挑戰,並引入了一個名為OPT2I的T2I提示優化框架,該框架利用大型語言模型(LLM)來改善T2I模型中的提示-圖像一致性。我們的框架從用戶提示開始,迭代生成修訂提示,目標是最大化一致性分數。我們在兩個數據集MSCOCO和PartiPrompts上進行了廣泛的驗證,結果顯示OPT2I可以將初始一致性分數提高高達24.9%,並保持FID,增加生成和真實數據之間的召回率。我們的工作利用LLM的威力,為通過構建更可靠和強大的T2I系統鋪平了道路。
English
Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs.

Summary

AI-Generated Summary

PDF191December 15, 2024