自我檢索的長距離語言建模

摘要

最近，檢索增強語言模型（LMs）受到了廣泛關注。然而，通常檢索器並未與LM的本機組件一起進行訓練，而是添加到已預先訓練的LM中，這限制了LM和檢索器相互適應的能力。在這項工作中，我們提出了檢索預訓練Transformer（RPT），這是一種從頭開始聯合訓練檢索增強LM的架構和訓練程序，用於對長文本進行建模任務。在長文檔中給定最近生成的文本片段後，LM計算查詢表示，然後用於檢索文檔中較早的片段，這些片段可能位於數萬個標記之前。從檢索的片段中提取的信息被融入LM表示中，以預測下一個目標片段。我們通過語義目標訓練檢索器組件，目標是檢索增加下一個片段概率的片段，根據參考LM。我們在四個長範圍語言建模任務上評估了RPT，涵蓋了書籍、代碼和數學寫作，並展示了RPT相對於強基準線在整體上提高了檢索質量和隨後困惑度。

English

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling long texts. Given a recently generated text chunk in a long document, the LM computes query representations, which are then used to retrieve earlier chunks in the document, located potentially tens of thousands of tokens before. Information from retrieved chunks is fused into the LM representations to predict the next target chunk. We train the retriever component with a semantic objective, where the goal is to retrieve chunks that increase the probability of the next chunk, according to a reference LM. We evaluate RPT on four long-range language modeling tasks, spanning books, code, and mathematical writing, and demonstrate that RPT improves retrieval quality and subsequently perplexity across the board compared to strong baselines.

自我檢索的長距離語言建模

Long-range Language Modeling with Self-retrieval

摘要

Support