연관성 회귀 메모리 트랜스포머

초록

본 논문은 매우 긴 시퀀스를 위한 신경망 아키텍처를 설계하는 과제를 다룹니다. 이 아키텍처는 각 시간 단계에서 새로운 정보를 처리하는 데 일정한 시간이 요구됩니다. 우리의 접근 방식인 연상 회귀 메모리 트랜스포머(Associative Recurrent Memory Transformer, ARMT)는 지역적 문맥을 위한 트랜스포머 자기 주의(self-attention)와 긴 문맥에 걸쳐 분포된 작업 특정 정보를 저장하기 위한 세그먼트 수준의 회귀를 기반으로 합니다. 우리는 ARMT가 연상 검색 작업에서 기존의 대안들을 능가하며, 최근의 BABILong 다중 작업 장문맥 벤치마크에서 5천만 토큰에 걸친 단일 사실 질문에 대해 79.9%의 정확도로 새로운 성능 기록을 세웠음을 입증합니다. 학습 및 평가를 위한 소스 코드는 깃허브에서 확인할 수 있습니다.

English

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.