ChatPaper.aiChatPaper

RePrompt:基于强化学习的推理增强型文本到图像生成重提示方法

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

May 23, 2025
作者: Mingrui Wu, Lu Wang, Pu Zhao, Fangkai Yang, Jianjin Zhang, Jianfeng Liu, Yuefeng Zhan, Weihao Han, Hao Sun, Jiayi Ji, Xiaoshuai Sun, Qingwei Lin, Weiwei Deng, Dongmei Zhang, Feng Sun, Qi Zhang, Rongrong Ji
cs.AI

摘要

尽管文本到图像(T2I)生成领域近期取得了进展,现有模型在处理简短且描述不充分的提示时,往往难以准确捕捉用户意图。虽然先前的研究尝试利用大型语言模型(LLMs)来增强提示,但这些方法由于缺乏对视觉语义和现实世界构图的充分把握,常常生成风格化或不切实际的内容。受语言模型推理最新进展的启发,我们提出了RePrompt,一种新颖的重新提示框架,通过强化学习将显式推理引入提示增强过程。与依赖手工规则或风格化改写不同,我们的方法训练语言模型生成结构化、自我反思的提示,通过优化图像级结果来实现。定制的奖励模型从人类偏好、语义对齐和视觉构图等方面评估生成的图像,为提示生成提供间接监督,从而实现无需人工标注数据的端到端训练。在GenEval和T2I-Compbench上的实验表明,RePrompt显著提升了空间布局的忠实度和跨多种T2I骨干网络的组合泛化能力,创下了新的最先进成果。
English
Despite recent progress in text-to-image (T2I) generation, existing models often struggle to faithfully capture user intentions from short and under-specified prompts. While prior work has attempted to enhance prompts using large language models (LLMs), these methods frequently generate stylistic or unrealistic content due to insufficient grounding in visual semantics and real-world composition. Inspired by recent advances in reasoning for language model, we propose RePrompt, a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning. Instead of relying on handcrafted rules or stylistic rewrites, our method trains a language model to generate structured, self-reflective prompts by optimizing for image-level outcomes. The tailored reward models assesse the generated images in terms of human preference, semantic alignment, and visual composition, providing indirect supervision to refine prompt generation. Our approach enables end-to-end training without human-annotated data. Experiments on GenEval and T2I-Compbench show that RePrompt significantly boosts spatial layout fidelity and compositional generalization across diverse T2I backbones, establishing new state-of-the-art results.

Summary

AI-Generated Summary

PDF62May 26, 2025