RePrompt：強化学習を用いた推論強化型リプロンプティングによるテキストから画像への生成

要旨

テキストから画像（T2I）生成における最近の進展にもかかわらず、既存のモデルは短くて不十分なプロンプトからユーザーの意図を忠実に捉えることに苦戦することが多い。これまでの研究では、大規模言語モデル（LLM）を用いてプロンプトを強化しようとする試みがなされてきたが、これらの手法は視覚的意味論や現実世界の構成に十分に基づいていないため、しばしばスタイリッシュで非現実的なコンテンツを生成してしまう。最近の言語モデルにおける推論の進展に触発され、我々はRePromptを提案する。これは、強化学習を介してプロンプト強化プロセスに明示的な推論を導入する新しいリプロンプティングフレームワークである。手作りのルールやスタイリッシュな書き換えに頼る代わりに、我々の手法は言語モデルを訓練し、画像レベルの結果を最適化することで、構造化された自己反省的なプロンプトを生成する。カスタマイズされた報酬モデルは、生成された画像を人間の好み、意味的整合性、視覚的構成の観点から評価し、プロンプト生成を洗練するための間接的な監督を提供する。我々のアプローチは、人間による注釈データを必要とせずにエンドツーエンドの訓練を可能にする。GenEvalとT2I-Compbenchでの実験により、RePromptが多様なT2Iバックボーンにわたって空間レイアウトの忠実度と構成的汎化を大幅に向上させ、新たな最先端の結果を確立することが示された。

English

Despite recent progress in text-to-image (T2I) generation, existing models often struggle to faithfully capture user intentions from short and under-specified prompts. While prior work has attempted to enhance prompts using large language models (LLMs), these methods frequently generate stylistic or unrealistic content due to insufficient grounding in visual semantics and real-world composition. Inspired by recent advances in reasoning for language model, we propose RePrompt, a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning. Instead of relying on handcrafted rules or stylistic rewrites, our method trains a language model to generate structured, self-reflective prompts by optimizing for image-level outcomes. The tailored reward models assesse the generated images in terms of human preference, semantic alignment, and visual composition, providing indirect supervision to refine prompt generation. Our approach enables end-to-end training without human-annotated data. Experiments on GenEval and T2I-Compbench show that RePrompt significantly boosts spatial layout fidelity and compositional generalization across diverse T2I backbones, establishing new state-of-the-art results.

RePrompt：強化学習を用いた推論強化型リプロンプティングによるテキストから画像への生成

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

要旨

Support