TransformerFAM：反馈注意力即工作记忆

摘要

尽管Transformer已经彻底改变了深度学习，但其二次注意力复杂度限制了其处理无限长输入的能力。我们提出了反馈注意力记忆（Feedback Attention Memory，FAM），这是一种新颖的Transformer架构，利用反馈循环使网络能够关注自己的潜在表示。这种设计促进了Transformer内部工作记忆的出现，使其能够处理无限长的序列。TransformerFAM不需要额外的权重，可以与预训练模型无缝集成。我们的实验表明，TransformerFAM显著改善了Transformer在长上下文任务中的性能，无论是在不同模型大小（1B、8B和24B）上。这些结果展示了赋予大型语言模型（LLMs）处理无限长度序列的潜力。

English

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

TransformerFAM：反馈注意力即工作记忆

TransformerFAM: Feedback attention is working memory

摘要

Support