Gen-Searcher：强化图像生成的智能搜索代理

摘要

近期图像生成模型展现出生成高保真度与逼真图像的强大能力。然而这些模型本质上受限于其固化的内部知识，因此在处理需要密集知识或最新信息的现实场景时常显不足。本文提出Gen-Searcher，首次尝试训练具备搜索增强能力的图像生成智能体，通过执行多跳推理与搜索来收集文本知识及参考图像，从而实现基于事实依据的生成。为实现这一目标，我们构建了定制化数据流水线，并精心策划了两个高质量数据集——包含多样化搜索密集型提示词及对应真实合成图像的Gen-Searcher-SFT-10k与Gen-Searcher-RL-6k。我们进一步推出KnowGen基准测试，该基准明确要求基于外部搜索知识的图像生成，并从多维度评估模型性能。基于这些资源，我们采用监督微调与智能体强化学习相结合的方式训练Gen-Searcher，其中双奖励反馈机制融合文本与图像奖励，为GRPO训练提供更稳定、信息量更大的学习信号。实验表明，Gen-Searcher带来显著性能提升，在KnowGen和WISE基准上分别将Qwen-Image模型性能提高约16分和15分。我们期望本工作能为图像生成领域的搜索智能体奠定开放基础，并完整开源数据、模型及代码。

English

Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.