ChatPaper.aiChatPaper

SAGE:基于执行反馈的深度搜索可导向智能数据生成

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

January 26, 2026
作者: Fangyuan Xu, Rujun Han, Yanfei Chen, Zifeng Wang, I-Hung Hsu, Jun Yan, Vishy Tirumalashetty, Eunsol Choi, Tomas Pfister, Chen-Yu Lee
cs.AI

摘要

深度搜索智能体旨在解答需要跨多文档推理的复杂问题,能显著加速信息检索过程。由于此类任务涉及冗长复杂的探索路径,人工标注成本极高。我们提出一种智能流程,可针对给定语料库和目标难度级别自动生成高质量、难度可控的深度搜索问答对。该流程SAGE包含两个组件:提出问答对的数据生成器,以及尝试解答生成问题并为数据生成器提供执行反馈的搜索智能体。二者通过多轮交互迭代优化问答对,直至满足目标难度要求。内在评估表明,SAGE生成的问题需要多样化推理策略,同时显著提升生成数据的准确性与难度。外在评估显示,使用合成数据训练的深度搜索智能体在主流基准测试中实现了最高23%的相对性能提升。补充实验证明,基于本数据训练的智能体能在推理时从固定语料检索无缝切换至谷歌搜索,且无需额外训练。
English
Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for this application is prohibitively expensive due to long and complex exploration trajectories. We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs for a given corpus and a target difficulty level. Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question and provide execution feedback for the data generator. The two components interact over multiple rounds to iteratively refine the question-answer pairs until they satisfy the target difficulty level. Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning strategies, while significantly increases the correctness and difficulty of the generated data. Our extrinsic evaluation demonstrates up to 23% relative performance gain on popular deep search benchmarks by training deep search agents with our synthetic data. Additional experiments show that agents trained on our data can adapt from fixed-corpus retrieval to Google Search at inference time, without further training.
PDF51January 28, 2026