尋找一根針在一個 10M 個稻草堆中:循環記憶找到了語言模型無法捕捉的內容
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
February 16, 2024
作者: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev
cs.AI
摘要
本文探討利用生成式變壓器模型處理長文檔的挑戰。為評估不同方法,我們引入了BABILong,這是一個新的基準測試,旨在評估模型在提取和處理廣泛文本中的分散事實方面的能力。我們的評估包括GPT-4和RAG的基準測試,顯示常見方法僅對長度不超過10^4個元素的序列有效。相反,通過對GPT-2進行微調並使用循環記憶增強,使其能夠處理包含高達10^7個元素的任務。這一成就標誌著一個重大飛躍,因為這是迄今為止任何開放神經網絡模型處理的最長輸入,顯示了對於長序列的處理能力有了顯著的改善。
English
This paper addresses the challenge of processing long documents using
generative transformer models. To evaluate different approaches, we introduce
BABILong, a new benchmark designed to assess model capabilities in extracting
and processing distributed facts within extensive texts. Our evaluation, which
includes benchmarks for GPT-4 and RAG, reveals that common methods are
effective only for sequences up to 10^4 elements. In contrast, fine-tuning
GPT-2 with recurrent memory augmentations enables it to handle tasks involving
up to 10^7 elements. This achievement marks a substantial leap, as it is by
far the longest input processed by any open neural network model to date,
demonstrating a significant improvement in the processing capabilities for long
sequences.