Mamba는 학습 방법을 배울 수 있는가? In-Context Learning 작업에 대한 비교 연구

초록

Mamba Gu & Dao (2034)와 같은 상태 공간 모델(SSMs)은 언어 모델링에서 Transformer 네트워크의 대안으로 제안되었습니다. 이 모델들은 게이팅, 컨볼루션, 그리고 입력에 의존적인 토큰 선택을 통합하여 다중 헤드 어텐션의 2차 비용을 완화합니다. SSMs는 경쟁력 있는 성능을 보이지만, 현대 언어 모델의 두드러진 특성인 매개변수 최적화 없이도 작업을 수행할 수 있게 해주는 문맥 내 학습(ICL) 능력은 Transformer에 비해 아직 충분히 탐구되지 않았습니다. 본 연구에서는 다양한 작업에서 Mamba를 중심으로 SSMs의 ICL 성능을 Transformer 모델과 비교 평가합니다. 우리의 결과는 SSMs가 표준 회귀 ICL 작업에서는 Transformer와 비슷한 성능을 보이지만, 희소 패리티 학습과 같은 작업에서는 더 우수한 성능을 보인다는 것을 나타냅니다. 그러나 SSMs는 비표준 검색 기능이 필요한 작업에서는 부족한 모습을 보입니다. 이러한 한계를 해결하기 위해, 우리는 Mamba와 어텐션 블록을 결합한 하이브리드 모델 \variant를 제안하며, 이 모델은 각 모델이 독립적으로 어려움을 겪는 작업에서 개별 모델을 능가합니다. 우리의 연구 결과는 하이브리드 아키텍처가 언어 모델의 ICL을 향상시키는 유망한 방향을 제시한다는 것을 시사합니다.

English

State-space models (SSMs), such as Mamba Gu & Dao (2034), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain underexplored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, \variant, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models.

Mamba는 학습 방법을 배울 수 있는가? In-Context Learning 작업에 대한 비교 연구

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

초록

Support