고속 바이트 잠재 트랜스포머

초록

최근 바이트 단위 언어 모델(byte-level language models, LMs)은 서브워드 어휘에 의존하지 않고도 토큰 단위 모델의 성능에 도달하지만, 바이트 단위의 느린 자동회귀 생성(autoregressive generation)으로 인해 실용성이 제한된다. 우리는 Byte Latent Transformer(BLT)에서 새로운 훈련 및 생성 기법을 통해 이러한 병목을 해결한다. 첫째, 표준 다음 바이트 예측 손실과 함께 보조적인 블록 단위 확산 목적 함수(auxiliary block-wise diffusion objective)로 훈련된 새로운 모델이자 가장 빠른 BLT 변형인 BLT Diffusion(BLT-D)을 도입한다. 이를 통해 각 디코딩 단계에서 여러 바이트를 병렬로 생성하는 추론 절차가 가능해져, 시퀀스 생성에 필요한 순방향 전달 횟수가 크게 줄어든다. 둘째, 속도 일부를 더 높은 생성 품질과 맞바꾸는 추측 디코딩(speculative decoding)에서 영감을 받은 두 가지 확장 기법을 제안한다: BLT의 로컬 디코더가 정상 패치 경계를 넘어 계속 생성하여 바이트를 초안(draft)으로 만들고, 이후 단일 전체 모델 순방향 전달로 검증하는 BLT Self-speculation(BLT-S); 그리고 확산 기반 생성 후 자동회귀 검증 단계를 추가한 BLT Diffusion+Verification(BLT-DV)이다. 모든 방법은 생성 작업에서 BLT 대비 추정 메모리 대역폭 비용을 50% 이상 낮출 수 있다. 각 접근 방식은 고유한 장점을 제공하며, 함께 바이트 단위 언어 모델의 실용적 사용에 대한 주요 장벽을 제거한다.

English

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction loss. This enables an inference procedure that generates multiple bytes in parallel per decoding step, substantially reducing the number of forward passes required to generate a sequence. Second, we propose two extensions inspired by speculative decoding that trade some of this speed for higher generation quality: BLT Self-speculation (BLT-S), in which BLT's local decoder continues generating past its normal patch boundaries to draft bytes, which are then verified with a single full-model forward pass; and BLT Diffusion+Verification (BLT-DV), which augments BLT-D with an autoregressive verification step after diffusion-based generation. All methods may achieve an estimated memory-bandwidth cost over 50% lower than BLT on generation tasks. Each approach offers its own unique advantages, together removing key barriers to the practical use of byte-level LMs.

고속 바이트 잠재 트랜스포머

Fast Byte Latent Transformer

초록

Support