RAPTOR: 트리 구조 기반 검색을 위한 재귀적 추상 처리

초록

검색 강화 언어 모델은 세계 상태의 변화에 더 잘 적응하고 희소 지식을 통합할 수 있습니다. 그러나 대부분의 기존 방법은 검색 코퍼스에서 짧은 연속 청크만을 검색하므로 문서 전체 문맥에 대한 종합적 이해가 제한됩니다. 우리는 텍스트 청크를 재귀적으로 임베딩하고 클러스터링하며 요약하는 새로운 접근 방식을 소개하며, 하향식으로 다양한 수준의 요약을 포함하는 트리를 구축합니다. 추론 시, 우리의 RAPTOR 모델은 이 트리에서 검색하여 긴 문서에 걸쳐 다양한 추상화 수준에서 정보를 통합합니다. 통제된 실험 결과, 재귀적 요약을 통한 검색은 여러 작업에서 기존의 검색 강화 언어 모델에 비해 상당한 개선을 보여줍니다. 복잡한 다단계 추론이 필요한 질의응답 작업에서 우리는 최첨단 결과를 보여주며, 예를 들어 RAPTOR 검색을 GPT-4와 결합하여 QuALITY 벤치마크에서 최고 성능을 절대 정확도 기준 20% 향상시킬 수 있습니다.

English

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

RAPTOR: 트리 구조 기반 검색을 위한 재귀적 추상 처리

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

초록

Support