神经提示:用于优化文本到图像生成的自适应框架
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
November 20, 2023
作者: Shachar Rosenman, Vasudev Lal, Phillip Howard
cs.AI
摘要
尽管最近文本到图像扩散模型取得了令人印象深刻的进展,但要获得高质量图像通常需要人类迅速进行工程处理,这些人类已经在使用中积累了专业知识。在这项工作中,我们提出了NeuroPrompts,这是一个自适应框架,可以自动增强用户的提示,以改善文本到图像模型生成的质量。我们的框架利用受限文本解码与经过训练的语言模型,该模型已经适应生成类似于人类提示工程师生成的提示。这种方法实现了更高质量的文本到图像生成,并通过约束集规范提供用户对风格特征的控制。我们通过创建一个基于Stable Diffusion的交互式应用程序来展示我们框架的实用性。此外,我们利用大量人类设计的提示数据集进行实验,并展示我们的方法自动产生的增强提示会导致更优质的图像质量。我们将我们的代码、一个屏幕录像演示视频和NeuroPrompts的实时演示实例公开提供。
English
Despite impressive recent advances in text-to-image diffusion models,
obtaining high-quality images often requires prompt engineering by humans who
have developed expertise in using them. In this work, we present NeuroPrompts,
an adaptive framework that automatically enhances a user's prompt to improve
the quality of generations produced by text-to-image models. Our framework
utilizes constrained text decoding with a pre-trained language model that has
been adapted to generate prompts similar to those produced by human prompt
engineers. This approach enables higher-quality text-to-image generations and
provides user control over stylistic features via constraint set specification.
We demonstrate the utility of our framework by creating an interactive
application for prompt enhancement and image generation using Stable Diffusion.
Additionally, we conduct experiments utilizing a large dataset of
human-engineered prompts for text-to-image generation and show that our
approach automatically produces enhanced prompts that result in superior image
quality. We make our code, a screencast video demo and a live demo instance of
NeuroPrompts publicly available.