NeuroPrompts: テキストから画像生成のためのプロンプト最適化を実現する適応型フレームワーク

要旨

テキストから画像への拡散モデルにおける最近の目覚ましい進展にもかかわらず、高品質な画像を得るためには、それらを使用する専門知識を身につけた人間によるプロンプトエンジニアリングが必要な場合が多い。本研究では、NeuroPromptsを提案する。これは、テキストから画像モデルが生成する画像の品質を向上させるために、ユーザーのプロンプトを自動的に強化する適応型フレームワークである。我々のフレームワークは、人間のプロンプトエンジニアが作成するプロンプトに類似したプロンプトを生成するように適応された事前学習済み言語モデルを用いた制約付きテキストデコーディングを利用する。このアプローチにより、より高品質なテキストから画像への生成が可能となり、制約条件の指定を通じてユーザーがスタイル的特徴を制御できるようになる。我々は、Stable Diffusionを使用したプロンプト強化と画像生成のためのインタラクティブアプリケーションを作成することで、このフレームワークの有用性を実証する。さらに、テキストから画像生成のための人間が作成した大規模なプロンプトデータセットを用いた実験を行い、我々のアプローチが自動的に強化されたプロンプトを生成し、優れた画像品質をもたらすことを示す。我々は、NeuroPromptsのコード、スクリーンキャスト動画デモ、およびライブデモインスタンスを公開している。

English

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code, a screencast video demo and a live demo instance of NeuroPrompts publicly available.

NeuroPrompts: テキストから画像生成のためのプロンプト最適化を実現する適応型フレームワーク

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

要旨

Support