自适应文本到图像生成的提示扩展

摘要

文本到图像生成模型功能强大，但使用起来很困难。用户需要精心设计特定提示以获得更好的图像，尽管这些图像可能会重复。本文提出了一个提示扩展框架，帮助用户以更少的努力生成高质量、多样化的图像。提示扩展模型以文本查询作为输入，并输出一组扩展文本提示，经过优化，当传递给文本到图像模型时，能生成更多种类的吸引人图像。我们进行了人类评估研究，结果显示通过提示扩展生成的图像在审美和多样性方面优于基准方法生成的图像。总体而言，本文提出了一种新颖有效的方法，改善文本到图像生成体验。

English

Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.

自适应文本到图像生成的提示扩展

Prompt Expansion for Adaptive Text-to-Image Generation

摘要

Support