Trasformatore di Memoria Associativa Ricorrente

Abstract

Questo articolo affronta la sfida di creare un'architettura neurale per sequenze molto lunghe che richieda un tempo costante per elaborare nuove informazioni ad ogni passo temporale. Il nostro approccio, Associative Recurrent Memory Transformer (ARMT), si basa sull'auto-attenzione dei transformer per il contesto locale e sulla ricorrenza a livello di segmento per la memorizzazione di informazioni specifiche del compito distribuite su un contesto lungo. Dimostriamo che ARMT supera le alternative esistenti nei compiti di recupero associativo e stabilisce un nuovo record di prestazioni nel recente benchmark multi-task a lungo contesto BABILong, rispondendo a domande su singoli fatti su 50 milioni di token con un'accuratezza del 79,9%. Il codice sorgente per l'addestramento e la valutazione è disponibile su GitHub.

English

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.