关联循环记忆变换器
Associative Recurrent Memory Transformer
July 5, 2024
作者: Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev
cs.AI
摘要
本文讨论了为处理每个时间步的新信息而需要恒定时间的非常长序列创建神经架构的挑战。我们的方法,即关联循环记忆变换器(ARMT),基于变换器的自注意力机制用于处理局部上下文,以及基于片段级别的循环用于存储分布在长上下文中的特定任务信息。我们展示了ARMT在关联检索任务中优于现有的替代方案,并在最近的BABILong多任务长上下文基准测试中取得了新的性能记录,通过对超过5000万标记的单事实问题的回答准确率达到了79.9%。训练和评估的源代码可在github上获得。
English
This paper addresses the challenge of creating a neural architecture for very
long sequences that requires constant time for processing new information at
each time step. Our approach, Associative Recurrent Memory Transformer (ARMT),
is based on transformer self-attention for local context and segment-level
recurrence for storage of task specific information distributed over a long
context. We demonstrate that ARMT outperfors existing alternatives in
associative retrieval tasks and sets a new performance record in the recent
BABILong multi-task long-context benchmark by answering single-fact questions
over 50 million tokens with an accuracy of 79.9%. The source code for training
and evaluation is available on github.Summary
AI-Generated Summary