Gen-Searcher: 画像生成のためのエージェント検索システムの強化

要旨

近年の画像生成モデルは、高精細で写実的な画像を生成する強力な能力を示している。しかし、これらのモデルは根本的に固定された内部知識に制約されているため、知識集約的であったり最新情報を必要とする現実世界のシナリオではしばしば失敗する。本論文では、検索拡張画像生成エージェントを訓練する初の試みとしてGen-Searcherを提案する。これは、マルチホップ推論と検索を行い、接地された生成に必要なテキスト知識と参照画像を収集する。これを実現するため、我々は特化したデータパイプラインを構築し、多様な検索集約型プロンプトと対応する正解合成画像を含む2つの高品質データセット、Gen-Searcher-SFT-10kおよびGen-Searcher-RL-6kを整備した。さらに、画像生成に検索接地型の外部知識を明示的に要求し、多次元からモデルを評価する包括的ベンチマークKnowGenを導入する。これらのリソースに基づき、Gen-SearcherをSFTで訓練した後、テキストベースと画像ベースの報酬を組み合わせてGRPO訓練により安定した情報量の多い学習信号を提供するエージェント的強化学習で仕上げる。実験では、Gen-Searcherが大幅な改善をもたらし、KnowGenで約16ポイント、WISEで約15ポイントQwen-Imageを向上させることを示す。本研究成果が画像生成における検索エージェントのオープンな基盤となることを期待し、データ、モデル、コードを完全にオープンソースとして公開する。

English

Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.

Gen-Searcher: 画像生成のためのエージェント検索システムの強化

Gen-Searcher: Reinforcing Agentic Search for Image Generation

要旨

Support