効率的な線形時間系列モデリングのためのゲート付きスロット注意

要旨

線形注意トランスフォーマーとそのゲート付きバリアントは、並列トレーニングと効率的な再帰推論を可能にすることで称賛されていますが、従来のトランスフォーマーと比較して、記憶集中型のタスクではまだ力不足であり、ゼロからのトレーニングには大きなリソースを要求します。本論文では、ゲート付きスロット注意（GSA）を紹介し、ゲート付き線形注意（GLA）に触発されたゲーティングメカニズムを取り入れることで、境界付きメモリ制御（ABC）で注意を強化します。基本的に、GSAは、ソフトマックスを介してリンクされた2層のGLAから構成され、コンテキストに応じたメモリ読み取りと適応的な忘却を利用してメモリ容量を向上させながら、コンパクトな再帰状態サイズを維持します。この設計により、GLAのハードウェア効率の良いトレーニングアルゴリズムと縮小された状態サイズを通じて、トレーニングと推論の効率が大幅に向上します。さらに、ソフトマックス演算を保持することは、「事前学習されたトランスフォーマーをRNNに微調整する」（T2R）設定において特に有益であり、ゼロからの広範なトレーニングの必要性を軽減します。包括的な実験により、GSAがコンテキスト内のリコールとT2R設定で優れたパフォーマンスを発揮することが確認されました。

English

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size. Additionally, retaining the softmax operation is particularly beneficial in "finetuning pretrained Transformers to RNNs" (T2R) settings, reducing the need for extensive training from scratch. Extensive experiments confirm GSA's superior performance in scenarios requiring in-context recall and in T2R settings.

効率的な線形時間系列モデリングのためのゲート付きスロット注意

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

要旨

Support