검색 증강 생성(Context Tuning for Retrieval Augmented Generation)

초록

대규모 언어 모델(LLMs)은 단 몇 가지 예시만으로도 새로운 과제를 해결할 수 있는 놀라운 능력을 가지고 있지만, 적절한 도구에 접근할 수 있어야 합니다. 검색 증강 생성(Retrieval Augmented Generation, RAG)은 주어진 과제에 대해 관련 도구 목록을 검색함으로써 이 문제를 해결합니다. 그러나 RAG의 도구 검색 단계에서는 필요한 모든 정보가 쿼리에 명시적으로 포함되어야 한다는 한계가 있습니다. 이는 널리 채택된 도구 검색 방법인 의미론적 검색(semantic search)이 쿼리가 불완전하거나 문맥이 부족할 때 실패할 수 있다는 점에서 문제가 됩니다. 이러한 한계를 해결하기 위해, 우리는 RAG를 위한 컨텍스트 튜닝(Context Tuning)을 제안합니다. 이는 도구 검색과 계획 생성 모두를 개선하기 위해 관련 정보를 가져오는 스마트 컨텍스트 검색 시스템을 사용합니다. 우리의 경량 컨텍스트 검색 모델은 수치적, 범주적, 그리고 습관적 사용 신호를 활용하여 컨텍스트 항목을 검색하고 순위를 매깁니다. 실험 결과, 컨텍스트 튜닝은 의미론적 검색을 크게 향상시켜 컨텍스트 검색 및 도구 검색 과제에서 각각 Recall@K가 3.5배 및 1.5배 개선되었으며, LLM 기반 플래너의 정확도가 11.6% 증가하는 결과를 보였습니다. 또한, 우리가 제안한 경량 모델이 Reciprocal Rank Fusion(RRF)과 LambdaMART를 사용하여 GPT-4 기반 검색을 능가하는 것을 확인했습니다. 더 나아가, 도구 검색 이후에도 계획 생성 단계에서 컨텍스트 증강이 환각(hallucination)을 줄이는 효과가 있음을 관찰했습니다.

English

Large language models (LLMs) have the remarkable ability to solve new tasks with just a few examples, but they need access to the right tools. Retrieval Augmented Generation (RAG) addresses this problem by retrieving a list of relevant tools for a given task. However, RAG's tool retrieval step requires all the required information to be explicitly present in the query. This is a limitation, as semantic search, the widely adopted tool retrieval method, can fail when the query is incomplete or lacks context. To address this limitation, we propose Context Tuning for RAG, which employs a smart context retrieval system to fetch relevant information that improves both tool retrieval and plan generation. Our lightweight context retrieval model uses numerical, categorical, and habitual usage signals to retrieve and rank context items. Our empirical results demonstrate that context tuning significantly enhances semantic search, achieving a 3.5-fold and 1.5-fold improvement in Recall@K for context retrieval and tool retrieval tasks respectively, and resulting in an 11.6% increase in LLM-based planner accuracy. Additionally, we show that our proposed lightweight model using Reciprocal Rank Fusion (RRF) with LambdaMART outperforms GPT-4 based retrieval. Moreover, we observe context augmentation at plan generation, even after tool retrieval, reduces hallucination.