TransformerFAM:反饋注意力即工作記憶
TransformerFAM: Feedback attention is working memory
April 14, 2024
作者: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar
cs.AI
摘要
儘管Transformer已經革新了深度學習,但其二次注意力複雜度阻礙了處理無限長輸入的能力。我們提出了反饋注意力記憶(FAM),這是一種新穎的Transformer架構,利用反饋循環使網絡能夠關注自己的潛在表示。這種設計促進了Transformer內部工作記憶的出現,使其能夠處理無限長序列。TransformerFAM不需要額外的權重,可以無縫集成到預訓練模型中。我們的實驗表明,TransformerFAM在各種模型大小(1B、8B和24B)上顯著提高了Transformer在長內容任務上的性能。這些結果展示了賦予大型語言模型(LLMs)處理無限長序列的潛力。
English
While Transformers have revolutionized deep learning, their quadratic
attention complexity hinders their ability to process infinitely long inputs.
We propose Feedback Attention Memory (FAM), a novel Transformer architecture
that leverages a feedback loop to enable the network to attend to its own
latent representations. This design fosters the emergence of working memory
within the Transformer, allowing it to process indefinitely long sequences.
TransformerFAM requires no additional weights, enabling seamless integration
with pre-trained models. Our experiments show that TransformerFAM significantly
improves Transformer performance on long-context tasks across various model
sizes (1B, 8B, and 24B). These results showcase the potential to empower Large
Language Models (LLMs) to process sequences of unlimited length.Summary
AI-Generated Summary