TreeHop: 다중 홉 질의 응답을 위한 다음 질의 임베딩의 효율적 생성 및 필터링

초록

다중 홉 질의응답(MHQA)에서 정보를 종합하기 위해 여러 문서 청크를 거쳐야 하는 복잡한 질의를 처리할 때, 검색 강화 생성(RAG) 시스템은 상당한 어려움에 직면합니다. 기존 접근법은 일반적으로 반복적인 대형 언어 모델(LLM) 기반 질의 재작성 및 라우팅에 의존하여, 반복적인 LLM 호출과 다단계 프로세스로 인해 높은 계산 비용이 발생합니다. 이러한 한계를 해결하기 위해, 우리는 질의 정제 과정에서 LLM을 필요로 하지 않는 임베딩 수준의 프레임워크인 TreeHop을 제안합니다. TreeHop은 이전 질의와 검색된 문서로부터의 의미 정보를 융합하여 질의 임베딩을 동적으로 업데이트함으로써, 임베딩 공간 연산만을 통해 반복적인 검색을 가능하게 합니다. 이 방법은 기존의 "검색-재작성-벡터화-검색" 사이클을 간소화된 "검색-임베딩-검색" 루프로 대체하여 계산 오버헤드를 크게 줄입니다. 또한, 규칙 기반의 중단 기준을 도입하여 불필요한 검색을 더욱 줄이고, 효율성과 재현율 사이의 균형을 맞춥니다. 실험 결과, TreeHop은 세 가지 개방형 도메인 MHQA 데이터셋에서 고급 RAG 방법들과 경쟁 가능한 성능을 보이며, 동시대 접근법과 비교하여 모델 파라미터 크기의 5\%-0.4\%만으로도 비슷한 성능을 달성하고 질의 지연 시간을 약 99\% 줄입니다. 이로 인해 TreeHop은 다양한 지식 집약적 애플리케이션에서 배포하기에 더 빠르고 비용 효율적인 솔루션으로 자리 잡습니다. 재현성을 위해 코드와 데이터는 https://github.com/allen-li1231/TreeHop에서 확인할 수 있습니다.

English

Retrieval-augmented generation (RAG) systems face significant challenges in multi-hop question answering (MHQA), where complex queries require synthesizing information across multiple document chunks. Existing approaches typically rely on iterative LLM-based query rewriting and routing, resulting in high computational costs due to repeated LLM invocations and multi-stage processes. To address these limitations, we propose TreeHop, an embedding-level framework without the need for LLMs in query refinement. TreeHop dynamically updates query embeddings by fusing semantic information from prior queries and retrieved documents, enabling iterative retrieval through embedding-space operations alone. This method replaces the traditional "Retrieve-Rewrite-Vectorize-Retrieve" cycle with a streamlined "Retrieve-Embed-Retrieve" loop, significantly reducing computational overhead. Moreover, a rule-based stop criterion is introduced to further prune redundant retrievals, balancing efficiency and recall rate. Experimental results show that TreeHop rivals advanced RAG methods across three open-domain MHQA datasets, achieving comparable performance with only 5\%-0.4\% of the model parameter size and reducing the query latency by approximately 99\% compared to concurrent approaches. This makes TreeHop a faster and more cost-effective solution for deployment in a range of knowledge-intensive applications. For reproducibility purposes, codes and data are available here: https://github.com/allen-li1231/TreeHop.

TreeHop: 다중 홉 질의 응답을 위한 다음 질의 임베딩의 효율적 생성 및 필터링

TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering

초록

Support