DenseMamba：効率的な大規模言語モデルのための密結合隠れ状態を持つ状態空間モデル

要旨

大規模言語モデル（LLM）は、一般的に使用されるTransformerアーキテクチャの過剰な計算量とメモリ要件により、大きな課題に直面しています。一方、状態空間モデル（SSM）は、計算複雑性が低い新しいタイプの基盤ネットワークアーキテクチャですが、その性能はまだTransformerに完全には匹敵していません。本論文では、SSMにおける層間の隠れ情報の流れを強化する新しいアプローチであるDenseSSMを紹介します。浅い層の隠れ状態を深い層に選択的に統合することで、DenseSSMは最終出力に不可欠な細かい情報を保持します。Dense接続を強化したDenseSSMは、依然として訓練の並列化可能性と推論効率を維持しています。提案手法は、RetNetやMambaなど、さまざまなSSMタイプに広く適用可能です。同様のモデルサイズで、DenseSSMは大幅な改善を達成し、DenseRetNetが公開ベンチマークで元のRetNetを最大5%の精度向上で上回る例が示されています。

English

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks.

DenseMamba：効率的な大規模言語モデルのための密結合隠れ状態を持つ状態空間モデル

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

要旨

Support