ChatPaper.aiChatPaper

MoM:基於混合記憶的線性序列建模

MoM: Linear Sequence Modeling with Mixture-of-Memories

February 19, 2025
作者: Jusen Du, Weigao Sun, Disen Lan, Jiaxi Hu, Yu Cheng
cs.AI

摘要

線性序列建模方法,如線性注意力、狀態空間建模和線性RNN,通過降低訓練和推理的複雜性,顯著提升了效率。然而,這些方法通常將整個輸入序列壓縮為單一固定大小的記憶狀態,這導致在召回密集型下游任務中表現欠佳。受神經科學啟發,尤其是大腦在保持穩健長期記憶的同時減輕“記憶干擾”的能力,我們引入了一種名為混合記憶(Mixture-of-Memories, MoM)的新架構。MoM利用多個獨立的記憶狀態,並通過路由網絡將輸入令牌定向到特定的記憶狀態。這種方法極大地增強了整體記憶容量,同時最小化了記憶干擾。因此,MoM在召回密集型任務中表現出色,超越了現有的線性序列建模技術。儘管引入了多個記憶狀態,每個記憶狀態的計算仍保持線性複雜度,使MoM在訓練時保留了線性複雜度的優勢,而在推理時保持常數複雜度。我們的實驗結果表明,MoM在下游語言任務中,尤其是召回密集型任務上,顯著優於當前的線性序列模型,甚至達到了與Transformer模型相當的性能。代碼已發佈於https://github.com/OpenSparseLLMs/MoM,並作為https://github.com/OpenSparseLLMs/Linear-MoE的一部分發佈。
English
Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE.

Summary

AI-Generated Summary

PDF362February 20, 2025