RAVine: 에이전트 탐색을 위한 현실 정렬 평가

초록

에이전트 기반 검색은 더 자율적이고 적응적인 검색 보강 패러다임으로서, 지능형 검색 시스템의 진화를 주도하고 있습니다. 그러나 기존의 평가 프레임워크는 에이전트 기반 검색의 목표와 잘 맞지 않습니다. 첫째, 현재 벤치마크에서 일반적으로 사용되는 복잡한 쿼리는 실제 사용자 검색 시나리오와는 거리가 있는 경우가 많습니다. 둘째, 기존 접근 방식은 종단 간 평가를 위한 정답 데이터를 추출할 때 노이즈를 유입시켜, 세밀한 수준에서 왜곡된 평가를 초래하는 경향이 있습니다. 셋째, 대부분의 현재 프레임워크는 최종 답변의 품질에만 초점을 맞추어, 에이전트 기반 검색에 내재된 반복적 프로세스의 평가를 소홀히 합니다. 이러한 한계를 해결하기 위해, 우리는 RAVine(Reality-Aligned eValuation)을 제안합니다. RAVine은 사용자 의도를 더 잘 반영하는 다중 포인트 쿼리와 장문 답변을 대상으로 하며, 세밀한 평가의 정확성을 높이기 위해 귀속 가능한 정답 구성 전략을 도입합니다. 또한, RAVine은 반복적 프로세스 전반에 걸쳐 모델의 검색 도구와의 상호작용을 검토하고, 효율성 요소를 고려합니다. 우리는 RAVine을 사용하여 일련의 모델을 벤치마킹하고 몇 가지 통찰을 도출했으며, 이를 통해 에이전트 기반 검색 시스템의 발전에 기여하기를 바랍니다. 코드와 데이터셋은 https://github.com/SwordFaith/RAVine에서 확인할 수 있습니다.

English

Agentic search, as a more autonomous and adaptive paradigm of retrieval augmentation, is driving the evolution of intelligent search systems. However, existing evaluation frameworks fail to align well with the goals of agentic search. First, the complex queries commonly used in current benchmarks often deviate from realistic user search scenarios. Second, prior approaches tend to introduce noise when extracting ground truth for end-to-end evaluations, leading to distorted assessments at a fine-grained level. Third, most current frameworks focus solely on the quality of final answers, neglecting the evaluation of the iterative process inherent to agentic search. To address these limitations, we propose RAVine -- a Reality-Aligned eValuation framework for agentic LLMs with search. RAVine targets multi-point queries and long-form answers that better reflect user intents, and introduces an attributable ground truth construction strategy to enhance the accuracy of fine-grained evaluation. Moreover, RAVine examines model's interaction with search tools throughout the iterative process, and accounts for factors of efficiency. We benchmark a series of models using RAVine and derive several insights, which we hope will contribute to advancing the development of agentic search systems. The code and datasets are available at https://github.com/SwordFaith/RAVine.

RAVine: 에이전트 탐색을 위한 현실 정렬 평가

RAVine: Reality-Aligned Evaluation for Agentic Search

초록

Support