DenseMamba：具有密集隱藏連接的狀態空間模型，用於高效的大型語言模型

摘要

大型語言模型（LLMs）面臨著艱鉅的挑戰，這是由於常用的Transformer架構所需的計算和記憶體需求過高。雖然狀態空間模型（SSM）是一種提供較低計算複雜度的新型基礎網絡架構，但它們的性能尚未完全能與Transformer相媲美。本文介紹了DenseSSM，這是一種增強SSM中各層之間隱藏信息流動的新方法。通過選擇性地將淺層隱藏狀態整合到更深層中，DenseSSM保留了對最終輸出至關重要的細粒度信息。增強了密集連接的DenseSSM仍然保持了訓練的可並行性和推理效率。該方法可廣泛應用於各種SSM類型，如RetNet和Mamba。在相似的模型大小下，DenseSSM實現了顯著的改進，例如DenseRetNet在公共基準測試中比原始RetNet提高了高達5%的準確性。

English

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks.

DenseMamba：具有密集隱藏連接的狀態空間模型，用於高效的大型語言模型

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

摘要

Support