自检索的长距离语言建模

摘要

最近，检索增强语言模型（LMs）受到了广泛关注。然而，通常情况下，检索器并未与LM的本机组件一起进行训练，而是被添加到已经预训练的LM中，这限制了LM和检索器相互适应的能力。在这项工作中，我们提出了检索预训练变压器（RPT），这是一种架构和训练程序，用于从头开始联合训练一个用于建模长文本任务的检索增强LM。给定长文档中最近生成的文本块，LM计算查询表示，然后用于检索文档中位于可能数万个标记之前的较早文本块。来自检索文本块的信息被融合到LM表示中，以预测下一个目标文本块。我们使用语义目标训练检索器组件，目标是检索增加下一个文本块概率的文本块，根据参考LM。我们在四个长距离语言建模任务上评估了RPT，涵盖了书籍、代码和数学写作，并证明RPT相对于强基线模型，提高了检索质量，进而改善了困惑度。

English

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling long texts. Given a recently generated text chunk in a long document, the LM computes query representations, which are then used to retrieve earlier chunks in the document, located potentially tens of thousands of tokens before. Information from retrieved chunks is fused into the LM representations to predict the next target chunk. We train the retriever component with a semantic objective, where the goal is to retrieve chunks that increase the probability of the next chunk, according to a reference LM. We evaluate RPT on four long-range language modeling tasks, spanning books, code, and mathematical writing, and demonstrate that RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.

自检索的长距离语言建模

Long-range Language Modeling with Self-retrieval

摘要

Support