尋找一根針在一個 10M 個稻草堆中：循環記憶找到了語言模型無法捕捉的內容

摘要

本文探討利用生成式變壓器模型處理長文檔的挑戰。為評估不同方法，我們引入了BABILong，這是一個新的基準測試，旨在評估模型在提取和處理廣泛文本中的分散事實方面的能力。我們的評估包括GPT-4和RAG的基準測試，顯示常見方法僅對長度不超過10^4個元素的序列有效。相反，通過對GPT-2進行微調並使用循環記憶增強，使其能夠處理包含高達10^7個元素的任務。這一成就標誌著一個重大飛躍，因為這是迄今為止任何開放神經網絡模型處理的最長輸入，顯示了對於長序列的處理能力有了顯著的改善。

English

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

尋找一根針在一個 10M 個稻草堆中：循環記憶找到了語言模型無法捕捉的內容

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

摘要

Support