Associatief Recurent Geheugen Transformer

Samenvatting

Dit artikel behandelt de uitdaging van het creëren van een neurale architectuur voor zeer lange sequenties die constante tijd vereist voor het verwerken van nieuwe informatie bij elke tijdstap. Onze aanpak, de Associative Recurrent Memory Transformer (ARMT), is gebaseerd op transformer self-attention voor lokale context en segment-level recurrentie voor de opslag van taakspecifieke informatie die verspreid is over een lange context. We tonen aan dat ARMT bestaande alternatieven overtreft in associatieve retrievetaakjes en een nieuw prestatierecord vestigt in de recente BABILong multi-task long-context benchmark door enkelvoudige feitenvragen te beantwoorden over 50 miljoen tokens met een nauwkeurigheid van 79,9%. De broncode voor training en evaluatie is beschikbaar op github.

English

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.