TransformerFAM: フィードバックアテンションはワーキングメモリとして機能する

要旨

Transformerは深層学習に革命をもたらしましたが、その二次的な注意機構の計算複雑性が、無限に長い入力の処理能力を妨げています。本論文では、フィードバックループを活用してネットワークが自身の潜在表現に注意を向けられるようにする新しいTransformerアーキテクチャ、Feedback Attention Memory（FAM）を提案します。この設計により、Transformer内にワーキングメモリが自然に形成され、無限に長いシーケンスの処理が可能になります。TransformerFAMは追加の重みを必要としないため、事前学習済みモデルとのシームレスな統合が可能です。実験結果から、TransformerFAMが様々なモデルサイズ（1B、8B、24B）において、長文脈タスクでのTransformerの性能を大幅に向上させることが示されました。これらの結果は、大規模言語モデル（LLM）が無制限の長さのシーケンスを処理できる可能性を示しています。

English

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

TransformerFAM: フィードバックアテンションはワーキングメモリとして機能する

TransformerFAM: Feedback attention is working memory

要旨

Support