RAPTOR:遞迴抽象處理用於樹狀檢索
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
January 31, 2024
作者: Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning
cs.AI
摘要
檢索增強語言模型能更好地適應世界狀態的變化並納入長尾知識。然而,大多數現有方法僅從檢索語料庫中檢索短連續片段,限制對整體文件上下文的全面理解。我們引入了一種新穎的方法,通過遞歸嵌入、聚類和總結文本片段,從底部開始構建具有不同摘要級別的樹。在推論時,我們的RAPTOR模型從這棵樹中檢索,整合不同抽象級別的長文檔信息。控制實驗表明,使用遞歸摘要進行檢索在多項任務上比傳統的檢索增強語言模型有顯著改進。在涉及複雜、多步推理的問答任務中,我們展示了最先進的結果;例如,通過將RAPTOR檢索與GPT-4的使用結合,我們可以將在QuALITY基準測試中的最佳表現提高20%的絕對準確性。
English
Retrieval-augmented language models can better adapt to changes in world
state and incorporate long-tail knowledge. However, most existing methods
retrieve only short contiguous chunks from a retrieval corpus, limiting
holistic understanding of the overall document context. We introduce the novel
approach of recursively embedding, clustering, and summarizing chunks of text,
constructing a tree with differing levels of summarization from the bottom up.
At inference time, our RAPTOR model retrieves from this tree, integrating
information across lengthy documents at different levels of abstraction.
Controlled experiments show that retrieval with recursive summaries offers
significant improvements over traditional retrieval-augmented LMs on several
tasks. On question-answering tasks that involve complex, multi-step reasoning,
we show state-of-the-art results; for example, by coupling RAPTOR retrieval
with the use of GPT-4, we can improve the best performance on the QuALITY
benchmark by 20% in absolute accuracy.