RAPTOR：遞迴抽象處理用於樹狀檢索

摘要

檢索增強語言模型能更好地適應世界狀態的變化並納入長尾知識。然而，大多數現有方法僅從檢索語料庫中檢索短連續片段，限制對整體文件上下文的全面理解。我們引入了一種新穎的方法，通過遞歸嵌入、聚類和總結文本片段，從底部開始構建具有不同摘要級別的樹。在推論時，我們的RAPTOR模型從這棵樹中檢索，整合不同抽象級別的長文檔信息。控制實驗表明，使用遞歸摘要進行檢索在多項任務上比傳統的檢索增強語言模型有顯著改進。在涉及複雜、多步推理的問答任務中，我們展示了最先進的結果；例如，通過將RAPTOR檢索與GPT-4的使用結合，我們可以將在QuALITY基準測試中的最佳表現提高20%的絕對準確性。

English

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

RAPTOR：遞迴抽象處理用於樹狀檢索

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

摘要

Support