適応的テキストから画像生成のためのプロンプト拡張

要旨

テキストから画像を生成するモデルは強力だが、使いこなすのが難しい。ユーザーはより良い画像を得るために特定のプロンプトを作成するが、生成される画像は繰り返しがちである。本論文では、ユーザーがより少ない労力で高品質で多様な画像を生成できるように支援するPrompt Expansionフレームワークを提案する。Prompt Expansionモデルはテキストクエリを入力として受け取り、最適化された拡張テキストプロンプトのセットを出力する。これらのプロンプトをテキストから画像を生成するモデルに渡すことで、より幅広く魅力的な画像を生成する。人間による評価実験を行った結果、Prompt Expansionを通じて生成された画像は、ベースライン手法で生成された画像よりも美的に優れ、多様性に富んでいることが示された。全体として、本論文はテキストから画像を生成する体験を改善するための新規かつ効果的なアプローチを提示している。

English

Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.

適応的テキストから画像生成のためのプロンプト拡張

Prompt Expansion for Adaptive Text-to-Image Generation

要旨

Support