ChatPaper.aiChatPaper

快速位元組潛在變換器

Fast Byte Latent Transformer

May 8, 2026
作者: Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz, Gargi Ghosh, Luke Zettlemoyer, Christopher Potts, Xiaochuang Han, Srinivasan Iyer
cs.AI

摘要

近期提出的字节级語言模型(Byte-level Language Models, LMs)在不依賴子詞詞彙表的情況下已可與詞元級模型的表現匹敵,但其實用性受限於緩慢的逐位元組自回歸生成過程。我們透過位元組潛在轉換器(Byte Latent Transformer, BLT)中的全新訓練與生成技術解決此效能瓶頸。首先,我們提出BLT擴散模型(BLT Diffusion, BLT-D),這是一款全新且速度最快的BLT變體,其訓練過程除了標準的次位元組預測損失函數外,另輔以區塊層級的擴散輔助目標。此設計支援一種推論程序,能在每次解碼步驟中平行生成多位元組,大幅減少生成序列所需的前向傳遞次數。其次,我們提出兩種受推測解碼(Speculative Decoding)啟發的擴展方法,在部分速度與生成品質間取得權衡:BLT自我推測模型(BLT Self-speculation, BLT-S),讓BLT的局部解碼器超越其正常區塊邊界持續生成草稿位元組,再透過一次完整模型前向傳遞進行驗證;以及BLT擴散與驗證模型(BLT Diffusion+Verification, BLT-DV),在基於擴散的生成步驟後,加入自回歸驗證步驟強化BLT-D。所有方法在生成任務中的估計記憶體頻寬成本,可比原BLT降低超過50%。每種方法各具獨特優勢,共同消除了位元組級語言模型實際應用的關鍵障礙。
English
Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction loss. This enables an inference procedure that generates multiple bytes in parallel per decoding step, substantially reducing the number of forward passes required to generate a sequence. Second, we propose two extensions inspired by speculative decoding that trade some of this speed for higher generation quality: BLT Self-speculation (BLT-S), in which BLT's local decoder continues generating past its normal patch boundaries to draft bytes, which are then verified with a single full-model forward pass; and BLT Diffusion+Verification (BLT-DV), which augments BLT-D with an autoregressive verification step after diffusion-based generation. All methods may achieve an estimated memory-bandwidth cost over 50% lower than BLT on generation tasks. Each approach offers its own unique advantages, together removing key barriers to the practical use of byte-level LMs.
PDF50May 12, 2026