DenseMamba: 고효율 대규모 언어 모델을 위한 밀집 은닉 연결 기반 상태 공간 모델

초록

대규모 언어 모델(LLMs)은 일반적으로 사용되는 Transformer 아키텍처의 과도한 계산 및 메모리 요구 사항으로 인해 큰 도전에 직면해 있습니다. 반면, 상태 공간 모델(SSM)은 더 낮은 계산 복잡도를 제공하는 새로운 유형의 기반 네트워크 아키텍처이지만, 그 성능은 아직 Transformer를 완전히 따라잡지 못하고 있습니다. 본 논문은 SSM에서 계층 간 숨겨진 정보의 흐름을 강화하기 위한 새로운 접근 방식인 DenseSSM을 소개합니다. DenseSSM은 얕은 계층의 숨겨진 상태를 깊은 계층에 선택적으로 통합함으로써 최종 출력에 중요한 세부 정보를 유지합니다. Dense 연결을 통해 강화된 DenseSSM은 여전히 훈련 병렬화 및 추론 효율성을 유지합니다. 제안된 방법은 RetNet 및 Mamba와 같은 다양한 SSM 유형에 광범위하게 적용될 수 있습니다. 유사한 모델 크기에서 DenseSSM은 상당한 개선을 달성하며, 특히 DenseRetNet은 공개 벤치마크에서 원래 RetNet보다 최대 5%의 정확도 향상을 보여줍니다.

English

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture. While state space model (SSM) is a new type of foundational network architecture offering lower computational complexity, their performance has yet to fully rival that of Transformers. This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs. By selectively integrating shallowlayer hidden states into deeper layers, DenseSSM retains fine-grained information crucial for the final output. Dense connections enhanced DenseSSM still maintains the training parallelizability and inference efficiency. The proposed method can be widely applicable to various SSM types like RetNet and Mamba. With similar model size, DenseSSM achieves significant improvements, exemplified by DenseRetNet outperforming the original RetNet with up to 5% accuracy improvement on public benchmarks.

DenseMamba: 고효율 대규모 언어 모델을 위한 밀집 은닉 연결 기반 상태 공간 모델

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

초록

Support