超越随机探索：训练数据如何提升智能搜索的代理效能

摘要

强化学习（RL）通过战略性地整合外部搜索引擎，已成为提升大语言模型（LLM）推理能力的有效方法。然而，当前基于强化学习的搜索代理通常依赖由精心设计的结果奖励引导的随机探索过程，导致推理轨迹低效且训练不稳定。为解决这些问题，我们提出了一种新颖的层次化经验框架（HiExp），以提升搜索代理的性能与训练稳定性。具体而言，我们通过对比分析和多层级聚类机制提取经验知识，将原始推理轨迹转化为层次化经验知识。借助经验对齐训练，我们有效规范了随机探索行为，使其演变为战略性与经验驱动的搜索过程。在多个复杂代理搜索和数学推理基准上的大量评估表明，我们的方法不仅实现了显著的性能提升，还展现出强大的跨任务与跨算法泛化能力。

English

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search process. Extensive evaluations on multiple complex agentic search and mathematical reasoning benchmarks demonstrate that our approach not only achieves substantial performance gains but also exhibits strong cross-task and cross-algorithm generalization.

超越随机探索：训练数据如何提升智能搜索的代理效能

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

摘要

Support