MambaByte：无令牌选择性状态空间模型

摘要

无记号语言模型直接从原始字节中学习，消除了子词记号化的偏见。然而，基于字节的操作会导致序列显著变长，并且标准的自回归Transformer在这种情况下扩展性较差。我们尝试了MambaByte，这是Mamba状态空间模型的无记号适应版本，它在字节序列上进行自回归训练。我们的实验表明，与其他字节级模型相比，MambaByte具有较高的计算效率。我们还发现，MambaByte在与最先进的子词Transformer相比具有竞争力，甚至表现更好。此外，由于长度的线性扩展，MambaByte在推理速度上比Transformer更有优势。我们的研究结果确立了MambaByte在实现无记号语言建模方面的可行性。

English

Token-free language models learn directly from raw bytes and remove the bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences, and standard autoregressive Transformers scale poorly in such settings. We experiment with MambaByte, a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. Our experiments indicate the computational efficiency of MambaByte compared to other byte-level models. We also find MambaByte to be competitive with and even outperform state-of-the-art subword Transformers. Furthermore, owing to linear scaling in length, MambaByte benefits from fast inference compared to Transformers. Our findings establish the viability of MambaByte in enabling token-free language modeling.

MambaByte：无令牌选择性状态空间模型

MambaByte: Token-free Selective State Space Model

摘要

Support