RealGen:透過偵測器引導獎勵實現逼真文本到圖像生成
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
November 29, 2025
作者: Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li
cs.AI
摘要
隨著影像生成技術的不斷進步,GPT-Image-1和Qwen-Image等先進模型已在文本-影像一致性與世界知識表現方面取得顯著成果。然而,這些模型在生成逼真影像方面仍存在不足,即使在簡單的文生圖任務中,也常產生帶有明顯人工智慧痕跡的「虛假」影像,其特徵通常表現為「過度光滑的皮膚」和「油膩的面部光澤」。為重拾「以假亂真」的生成初衷,我們提出RealGen——一個專注於逼真度的文本至影像生成框架。該框架整合了用於提示詞優化的LLM組件與實現真實影像生成的擴散模型。受對抗式生成啟發,RealGen引入「檢測器獎勵」機制,通過語義層級和特徵層級的合成影像檢測器來量化偽影並評估真實感。我們運用GRPO算法對該獎勵信號進行優化,顯著提升影像真實感與細節表現。此外,我們提出RealBench自動化評估基準,採用檢測器評分與競技場評分相結合的方式,實現無需人工介入的逼真度評估,其結果更精準且符合真實用戶體驗。實驗表明,RealGen在真實感、細節呈現和美學品質上均顯著優於GPT-Image-1、Qwen-Image等通用模型,以及FLUX-Krea等專注逼真度的特定模型。程式碼已開源於:https://github.com/yejy53/RealGen。
English
With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics. The code is available at https://github.com/yejy53/RealGen.