MambaByte：無需標記的選擇性狀態空間模型

摘要

無需標記的語言模型直接從原始位元組學習，消除了次單詞標記化的偏見。然而，使用位元組會導致序列變得顯著較長，而標準的自回歸Transformer在這種情況下表現不佳。我們對MambaByte進行了實驗，這是Mamba狀態空間模型的無標記適應版本，是在位元組序列上自回歸訓練的。我們的實驗顯示，與其他位元組級模型相比，MambaByte的計算效率較高。我們還發現，MambaByte在與最先進的次單詞Transformer相比具有競爭力，甚至表現優異。此外，由於長度的線性擴展，MambaByte在推理速度上比Transformer更具優勢。我們的研究結果確立了MambaByte在實現無標記語言建模方面的可行性。

English

Token-free language models learn directly from raw bytes and remove the bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences, and standard autoregressive Transformers scale poorly in such settings. We experiment with MambaByte, a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. Our experiments indicate the computational efficiency of MambaByte compared to other byte-level models. We also find MambaByte to be competitive with and even outperform state-of-the-art subword Transformers. Furthermore, owing to linear scaling in length, MambaByte benefits from fast inference compared to Transformers. Our findings establish the viability of MambaByte in enabling token-free language modeling.

MambaByte：無需標記的選擇性狀態空間模型

MambaByte: Token-free Selective State Space Model

摘要

Support