DenseMamba:具有密集隱藏連接的狀態空間模型,用於高效的大型語言模型
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
February 26, 2024
作者: Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang
cs.AI
摘要
大型語言模型(LLMs)面臨著艱鉅的挑戰,這是由於常用的Transformer架構所需的計算和記憶體需求過高。雖然狀態空間模型(SSM)是一種提供較低計算複雜度的新型基礎網絡架構,但它們的性能尚未完全能與Transformer相媲美。本文介紹了DenseSSM,這是一種增強SSM中各層之間隱藏信息流動的新方法。通過選擇性地將淺層隱藏狀態整合到更深層中,DenseSSM保留了對最終輸出至關重要的細粒度信息。增強了密集連接的DenseSSM仍然保持了訓練的可並行性和推理效率。該方法可廣泛應用於各種SSM類型,如RetNet和Mamba。在相似的模型大小下,DenseSSM實現了顯著的改進,例如DenseRetNet在公共基準測試中比原始RetNet提高了高達5%的準確性。
English
Large language models (LLMs) face a daunting challenge due to the excessive
computational and memory requirements of the commonly used Transformer
architecture. While state space model (SSM) is a new type of foundational
network architecture offering lower computational complexity, their performance
has yet to fully rival that of Transformers. This paper introduces DenseSSM, a
novel approach to enhance the flow of hidden information between layers in
SSMs. By selectively integrating shallowlayer hidden states into deeper layers,
DenseSSM retains fine-grained information crucial for the final output. Dense
connections enhanced DenseSSM still maintains the training parallelizability
and inference efficiency. The proposed method can be widely applicable to
various SSM types like RetNet and Mamba. With similar model size, DenseSSM
achieves significant improvements, exemplified by DenseRetNet outperforming the
original RetNet with up to 5% accuracy improvement on public benchmarks.