ChatPaper.aiChatPaper

關聯循環記憶轉換器

Associative Recurrent Memory Transformer

July 5, 2024
作者: Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev
cs.AI

摘要

本文探討了為處理需要在每個時間步驟上對新資訊進行恆定時間處理的非常長序列創建神經架構的挑戰。我們的方法,即聯想循環記憶變壓器(ARMT),基於變壓器自注意力用於本地上下文,並基於段級循環用於存儲分佈在長上下文中的任務特定信息。我們證明了ARMT在聯想檢索任務中優於現有替代方案,在最近的BABILong多任務長上下文基準測試中,通過對超過5000萬令牌進行單事實問題回答,達到了79.9%的準確率,創下了新的性能記錄。培訓和評估的源代碼可在github上獲得。
English
This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.

Summary

AI-Generated Summary

PDF372November 28, 2024