適應性文本到圖像生成的提示擴展

摘要

文字到圖像生成模型雖然功能強大，但使用起來卻很困難。用戶需精心設計特定提示以獲得更好的圖像，然而這些圖像可能會重複。本文提出了一個提示擴展框架，幫助用戶以更少的努力生成高質量、多樣化的圖像。提示擴展模型以文本查詢作為輸入，輸出一組擴展的文本提示，經過優化，當傳遞給文字到圖像模型時，生成更廣泛、吸引人的圖像。我們進行了一項人類評估研究，結果顯示通過提示擴展生成的圖像在美學上更為吸引人且多樣化，優於基準方法生成的圖像。總的來說，本文提出了一種新穎且有效的方法來改善文字到圖像生成的體驗。

English

Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.

適應性文本到圖像生成的提示擴展

Prompt Expansion for Adaptive Text-to-Image Generation

摘要

Support