MoM：メモリの混合による線形シーケンスモデリング

要旨

線形シーケンスモデリング手法、例えば線形アテンション、状態空間モデリング、線形RNNなどは、訓練と推論の複雑さを低減することで、大幅な効率改善を提供します。しかし、これらの手法は通常、入力シーケンス全体を単一の固定サイズのメモリ状態に圧縮するため、リコール集約型の下流タスクでは最適な性能を発揮しません。神経科学、特に脳が「メモリ干渉」を軽減しながら堅牢な長期記憶を維持する能力に着想を得て、我々はMixture-of-Memories（MoM）と呼ばれる新しいアーキテクチャを提案します。MoMは複数の独立したメモリ状態を利用し、ルーターネットワークが入力トークンを特定のメモリ状態に振り分けます。このアプローチにより、メモリ干渉を最小化しつつ、全体的なメモリ容量を大幅に向上させます。その結果、MoMはリコール集約型タスクで優れた性能を発揮し、既存の線形シーケンスモデリング技術を凌駕します。複数のメモリ状態を組み込んでいるにもかかわらず、各メモリ状態の計算は線形の複雑さを維持しており、MoMは訓練中に線形複雑性の利点を保持しつつ、推論時には定数複雑性を実現します。実験結果は、MoMが現在の線形シーケンスモデルを下流の言語タスク、特にリコール集約型タスクで大幅に上回り、Transformerモデルに匹敵する性能を達成することを示しています。コードはhttps://github.com/OpenSparseLLMs/MoMで公開されており、https://github.com/OpenSparseLLMs/Linear-MoEの一部としても公開されています。

English

Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE.

MoM：メモリの混合による線形シーケンスモデリング

MoM: Linear Sequence Modeling with Mixture-of-Memories

要旨

Support