確率論的探索を超えて：エージェンシック検索における学習データの価値とは

要旨

強化学習（RL）は、外部検索エンジンの戦略的統合を通じて大規模言語モデル（LLM）の推論能力を向上させる効果的な手法として確立されつつある。しかし、現在のRLベースの検索エージェントは、綿密に設計された結果報酬に導かれた確率的探索プロセスに依存することが多く、非効率な推論軌道と不安定な訓練を招いている。これらの課題を解決するため、本論文では検索エージェントの性能と訓練安定性を高める新たなフレームワーク「階層的経験（HiExp）」を提案する。具体的には、対照分析とマルチレベルクラスタリング機構による経験的知識の抽出により、生の推論軌道を階層的な経験知識へ変換する。経験整合型訓練を活用することで、確率的探索を効果的に正則化し、戦略的かつ経験駆動型の検索プロセスへと進化させる。複雑なエージェント検索および数学的推論ベンチマークを用いた大規模評価により、本手法が大幅な性能向上を達成するだけでなく、強力なクロスタスク・クロスアルゴリズム汎化性能を発揮することを実証する。

English

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search process. Extensive evaluations on multiple complex agentic search and mathematical reasoning benchmarks demonstrate that our approach not only achieves substantial performance gains but also exhibits strong cross-task and cross-algorithm generalization.

確率論的探索を超えて：エージェンシック検索における学習データの価値とは

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

要旨

Support