MambaByte: 토큰 없는 선택적 상태 공간 모델

초록

토큰 없는 언어 모델은 원시 바이트에서 직접 학습하며 서브워드 토큰화의 편향을 제거합니다. 그러나 바이트 단위로 작동하면 시퀀스 길이가 상당히 길어지며, 표준 자기회귀 트랜스포머는 이러한 설정에서 성능이 크게 저하됩니다. 우리는 Mamba 상태 공간 모델을 바이트 시퀀스에 대해 자기회귀적으로 학습시킨 토큰 없는 적응 모델인 MambaByte를 실험했습니다. 실험 결과, MambaByte는 다른 바이트 수준 모델에 비해 계산 효율성이 뛰어난 것으로 나타났습니다. 또한 MambaByte는 최신 서브워드 트랜스포머와 경쟁력이 있으며, 심지어 이를 능가하는 성능을 보였습니다. 더욱이, 길이에 대한 선형 스케일링 덕분에 MambaByte는 트랜스포머에 비해 빠른 추론 속도를 자랑합니다. 이러한 결과는 MambaByte가 토큰 없는 언어 모델링을 가능하게 하는 데 있어 그 타당성을 입증합니다.

English

Token-free language models learn directly from raw bytes and remove the bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences, and standard autoregressive Transformers scale poorly in such settings. We experiment with MambaByte, a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. Our experiments indicate the computational efficiency of MambaByte compared to other byte-level models. We also find MambaByte to be competitive with and even outperform state-of-the-art subword Transformers. Furthermore, owing to linear scaling in length, MambaByte benefits from fast inference compared to Transformers. Our findings establish the viability of MambaByte in enabling token-free language modeling.

MambaByte: 토큰 없는 선택적 상태 공간 모델

MambaByte: Token-free Selective State Space Model

초록

Support