GrepSeek: 训练搜索代理实现直接语料库交互

摘要

大型语言模型（LLM）搜索代理通过多轮推理和信息检索，在知识密集型语言任务中展现出强大的潜力。现有系统大多使用检索器获取信息：该检索器接收关键字或自然语言查询，并基于预计算文档表示的索引返回排序后的文档列表。在本研究中，我们探索了一种互补视角，即搜索代理将语料库本身视为搜索环境，并通过执行可执行的 shell 命令来寻找证据。我们提出了 GrepSeek，一种优化的直接语料库交互（DCI）搜索代理，它训练了一个紧凑的搜索代理，用于从大型文本语料库中查找、筛选和组合证据。为了解决在大语料库上直接使用强化学习进行学习时行为不稳定的问题，我们提出了一种两阶段训练流程。首先，我们利用具有答案感知能力的 Tutor 和不知晓答案的 Planner 构建一个冷启动数据集，生成经过验证且具有因果依据的搜索轨迹。其次，我们使用分组相对策略优化（GRPO）对初始化策略进行微调，使代理能够通过与语料库的直接交互来改进其面向任务的搜索行为。为了使得 DCI 在大规模场景下实用，我们进一步采用了一种保持语义的分片并行执行引擎，该引擎可将基于 shell 的检索速度提升高达 7.6 倍，同时保持与 shell 命令顺序执行在字节级别的严格等价性。在七个开放域问答基准上的实验表明，GrepSeek 在整体词元级别的 F_1 和精确匹配方面达到了最强性能。我们的分析还揭示了纯词汇交互在处理表面形式变化较大的查询时的局限性，这表明 DCI 作为一种实用且具有竞争力的搜索代理方法，可以在现实世界中补充现有的检索范式。

English

Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using an index of pre-computed document representations. In this work, we explore a complementary perspective in which the search agent treats the corpus itself as the search environment and finds evidence by issuing executable shell commands. We introduce GrepSeek, an optimized direct corpus interaction (DCI) search agent that trains a compact search agent to find, filter, and compose evidence from large text corpora. To address the instability of learning behavior directly with reinforcement learning on large corpora, we propose a two-stage training pipeline. First, we construct a cold-start dataset using an answer-aware Tutor and answer-blind Planner to generate verified, causally grounded search trajectories. Second, we refine the initialized policy with Group Relative Policy Optimization (GRPO), allowing the agent to improve its task-oriented search behavior through direct interaction with the corpus. To make DCI practical at scale, we further use a semantics-preserving sharded-parallel execution engine that accelerates shell-based retrieval by up to 7.6times while preserving byte-exact equivalence with sequential execution of the shell command. Experiments across seven open-domain question answering benchmarks show that GrepSeek achieves the strongest overall token-level F_1 and Exact Match. Our analysis also highlights the limitations of purely lexical interaction on queries with substantial surface-form variation, suggesting DCI as a practical and competitive method for search agents that can complement existing retrieval paradigms in the real world.