蛇紋瑪巴能學會如何學習嗎？一項關於內文學習任務的比較研究

摘要

狀態空間模型（SSMs），如Mamba Gu＆Dao（2034），已被提議作為語言建模中替代Transformer網絡的選擇，通過整合閘控、卷積和依賴輸入的標記選擇，以減輕多頭注意力的二次成本。儘管SSMs表現出競爭力，但它們的上下文學習（ICL）能力，這是現代語言模型的一個顯著新興特性，使任務執行無需參數優化，與Transformers相比仍未得到充分探索。在本研究中，我們評估了SSMs的ICL性能，重點放在Mamba上，並與Transformer模型在各種任務中進行比較。我們的結果顯示，SSMs在標準回歸ICL任務中表現與Transformers相當，而在稀疏奇偶學習等任務中表現優於它們。然而，在涉及非標準檢索功能的任務中，SSMs表現不佳。為了解決這些限制，我們引入了一種混合模型，\variant，將Mamba與注意力塊結合，超越了單獨模型在獨立困難任務中的表現。我們的研究結果表明，混合架構為增強語言模型中的ICL提供了有前途的途徑。

English

State-space models (SSMs), such as Mamba Gu & Dao (2034), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain underexplored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, \variant, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models.

蛇紋瑪巴能學會如何學習嗎？一項關於內文學習任務的比較研究

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

摘要

Support