확률적 탐색을 넘어서: 에이전트 탐색에 있어 훈련 데이터의 가치 결정 요인

초록

강화학습(RL)은 외부 검색 엔진의 전략적 통합을 통해 대규모 언어 모델(LLM)의 추론 능력을 향상시키는 효과적인 접근법으로 자리잡았다. 그러나 현재 RL 기반 검색 에이전트는 신중하게 설계된 결과 보상에 기반한 확률적 탐색 과정에 의존하는 경우가 많아, 비효율적인 추론 경로와 불안정한 학습을 초래한다. 이러한 문제를 해결하기 위해 본 연구에서는 검색 에이전트의 성능과 학습 안정성을 향상시키는 새로운 프레임워크인 계층적 경험(Hierarchical Experience, HiExp)을 제안한다. 구체적으로, 우리는 대조 분석과 다단계 클러스터링 메커니즘을 통해 경험적 지식을 추출하여 원시 추론 경로를 계층적 경험 지식으로 변환한다. 경험 기반 훈련을 활용함으로써 확률적 탐색을 효과적으로 규제하여, 이를 전략적이고 경험에 기반한 검색 과정으로 발전시킨다. 다양한 복잡한 에이전트 검색 및 수학적 추론 벤치마크에서의 광범위한 평가를 통해, 우리의 접근법이 상당한 성능 향상을 달성할 뿐만 아니라 강력한 교차 작업 및 교차 알고리즘 일반화 능력을 보여줌을 입증한다.

English

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search process. Extensive evaluations on multiple complex agentic search and mathematical reasoning benchmarks demonstrate that our approach not only achieves substantial performance gains but also exhibits strong cross-task and cross-algorithm generalization.

확률적 탐색을 넘어서: 에이전트 탐색에 있어 훈련 데이터의 가치 결정 요인

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

초록

Support