DenseMamba：具有密集隐藏连接的状态空间模型，用于高效的大型语言模型。

摘要

大型语言模型（LLMs）面临巨大挑战，因为常用Transformer架构的计算和内存需求过高。虽然状态空间模型（SSM）是一种新型基础网络架构，具有较低的计算复杂度，但它们的性能尚未完全能与Transformers相媲美。本文介绍了DenseSSM，一种增强SSM中隐藏信息流动的新方法。通过有选择地将浅层隐藏状态集成到更深层，DenseSSM保留了对最终输出至关重要的细粒度信息。增强了密集连接的DenseSSM仍保持了训练的并行性和推理效率。该方法可广泛应用于各种SSM类型，如RetNet和Mamba。在相似的模型大小下，DenseSSM取得了显著的改进，例如DenseRetNet在公共基准测试中比原始RetNet提高了高达5%的准确性。

English

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks.

DenseMamba：具有密集隐藏连接的状态空间模型，用于高效的大型语言模型。

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

摘要

Support