RAPTOR:树状检索的递归抽象处理
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
January 31, 2024
作者: Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning
cs.AI
摘要
检索增强语言模型能更好地适应世界状态的变化并整合长尾知识。然而,大多数现有方法仅从检索语料库中检索短连续片段,限制了对整体文档背景的全面理解。我们引入了一种新颖的方法,通过递归嵌入、聚类和总结文本片段,从底部开始构建具有不同摘要级别的树。在推断时,我们的RAPTOR模型从该树中检索,整合不同抽象级别的长文档信息。控制实验表明,使用递归摘要进行检索在多个任务上比传统的检索增强语言模型有显著改进。在涉及复杂、多步推理的问答任务中,我们展示了最先进的结果;例如,通过将RAPTOR检索与GPT-4的使用相结合,我们可以将在QuALITY基准测试中的最佳性能提高20%的绝对准确率。
English
Retrieval-augmented language models can better adapt to changes in world
state and incorporate long-tail knowledge. However, most existing methods
retrieve only short contiguous chunks from a retrieval corpus, limiting
holistic understanding of the overall document context. We introduce the novel
approach of recursively embedding, clustering, and summarizing chunks of text,
constructing a tree with differing levels of summarization from the bottom up.
At inference time, our RAPTOR model retrieves from this tree, integrating
information across lengthy documents at different levels of abstraction.
Controlled experiments show that retrieval with recursive summaries offers
significant improvements over traditional retrieval-augmented LMs on several
tasks. On question-answering tasks that involve complex, multi-step reasoning,
we show state-of-the-art results; for example, by coupling RAPTOR retrieval
with the use of GPT-4, we can improve the best performance on the QuALITY
benchmark by 20% in absolute accuracy.