ChatPaper.aiChatPaper

尋找一根針在一個 10M 個稻草堆中:循環記憶找到了語言模型無法捕捉的內容

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024
作者: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev
cs.AI

摘要

本文探討利用生成式變壓器模型處理長文檔的挑戰。為評估不同方法,我們引入了BABILong,這是一個新的基準測試,旨在評估模型在提取和處理廣泛文本中的分散事實方面的能力。我們的評估包括GPT-4和RAG的基準測試,顯示常見方法僅對長度不超過10^4個元素的序列有效。相反,通過對GPT-2進行微調並使用循環記憶增強,使其能夠處理包含高達10^7個元素的任務。這一成就標誌著一個重大飛躍,因為這是迄今為止任何開放神經網絡模型處理的最長輸入,顯示了對於長序列的處理能力有了顯著的改善。
English
This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.
PDF438December 15, 2024