RealGen:通过检测器引导奖励实现逼真文本到图像生成
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
November 29, 2025
作者: Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li
cs.AI
摘要
随着图像生成技术的持续进步,GPT-Image-1与Qwen-Image等先进模型已在文本-图像一致性和世界知识表现上取得显著成果,但在生成逼真图像方面仍存在不足。即便在简单的文生图任务中,这些模型也易产生带有明显AI痕迹的"虚假"图像,常表现为"过度光滑的皮肤"和"油光发亮的面部"。为重新实现"以假乱真"的生成目标,我们提出RealGen——一个逼真文生图框架。该框架集成大语言模型组件用于提示词优化,并结合扩散模型实现真实感图像生成。受对抗生成思想启发,RealGen引入"检测器奖励"机制,通过语义级与特征级合成图像检测器量化伪影并评估真实感。我们采用GRPO算法利用该奖励信号优化整个生成流程,显著提升图像真实感与细节表现。此外,我们提出RealBench自动化评估基准,通过检测器评分与竞技场评分实现无需人工介入的逼真度评估,其评估结果更精准且符合真实用户体验。实验表明,RealGen在真实感、细节度和美学质量上显著优于GPT-Image-1、Qwen-Image等通用模型,以及FLUX-Krea等专业级逼真生成模型。代码已开源:https://github.com/yejy53/RealGen。
English
With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics. The code is available at https://github.com/yejy53/RealGen.