RAPTOR：树状检索的递归抽象处理

摘要

检索增强语言模型能更好地适应世界状态的变化并整合长尾知识。然而，大多数现有方法仅从检索语料库中检索短连续片段，限制了对整体文档背景的全面理解。我们引入了一种新颖的方法，通过递归嵌入、聚类和总结文本片段，从底部开始构建具有不同摘要级别的树。在推断时，我们的RAPTOR模型从该树中检索，整合不同抽象级别的长文档信息。控制实验表明，使用递归摘要进行检索在多个任务上比传统的检索增强语言模型有显著改进。在涉及复杂、多步推理的问答任务中，我们展示了最先进的结果；例如，通过将RAPTOR检索与GPT-4的使用相结合，我们可以将在QuALITY基准测试中的最佳性能提高20%的绝对准确率。

English

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

RAPTOR：树状检索的递归抽象处理

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

摘要

Support