Gen-Searcher:強化圖像生成的智能代理搜索能力
Gen-Searcher: Reinforcing Agentic Search for Image Generation
March 30, 2026
作者: Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue
cs.AI
摘要
近期影像生成模型已展現出生成高保真度與逼真圖像的強大能力。然而,這些模型本質上受制於其固化的內部知識,因此在處理需要密集知識或即時資訊的真實世界場景時往往表現不佳。本文提出Gen-Searcher,作為首個訓練搜尋增強型影像生成代理的嘗試,該代理能執行多跳躍推理與搜尋,以收集用於具象化生成所需的文本知識與參考圖像。為實現此目標,我們建構了專屬數據管道並策劃兩個高品質數據集:Gen-Searcher-SFT-10k與Gen-Searcher-RL-6k,其中包含多樣化的搜尋密集型提示詞及對應的真實合成圖像。我們進一步推出KnowGen基準測試,該基準明確要求影像生成需基於搜尋獲取的外部知識,並從多維度評估模型性能。基於這些資源,我們先以監督微調訓練Gen-Searcher,再透過具備雙重獎勵回饋的代理強化學習進行優化——結合文本與圖像獎勵機制,為GRPO訓練提供更穩定且富含資訊的學習信號。實驗結果顯示,Gen-Searcher帶來顯著效益,使Qwen-Image在KnowGen與WISE基準上分別提升約16分與15分。我們期望此工作能成為影像生成領域中搜尋代理的開放基礎,並將數據、模型與程式碼全面開源。
English
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.