ChatPaper.aiChatPaper

BitDance:基於二元符號的自迴歸生成模型擴展

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

February 15, 2026
作者: Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Huaibo Huang, Xiangyu Yue, Hao Chen
cs.AI

摘要

我們提出 BitDance,一種可擴展的自迴歸影像生成器,其預測對象是二進位視覺標記而非編碼簿索引。透過高熵二進位潛在表示,BitDance 使每個標記能表徵高達 2^{256} 種狀態,形成緊湊且高表現力的離散表示。在如此巨大的標記空間中進行採樣,傳統分類方法難以實現。為解決此問題,BitDance 採用二進位擴散頭:不再透過 softmax 預測索引,而是利用連續空間擴散來生成二進位標記。此外,我們提出下一區塊擴散技術,這種新型解碼方法能高精度並行預測多個標記,大幅加速推理過程。在 ImageNet 256x256 數據集上,BitDance 實現了 1.24 的 FID 分數,成為自迴歸模型中的最佳成績。結合下一區塊擴散技術後,BitDance 在僅使用 2.6 億參數(減少 5.4 倍)的情況下,不僅超越參數量達 14 億的頂尖並行自迴歸模型,更實現 8.7 倍的推理加速。針對文字生成影像任務,BitDance 透過大規模多模態標記訓練,能高效生成高解析度逼真影像,展現出卓越性能與優異的擴展性。在生成 1024x1024 影像時,相較於現有自迴歸模型,BitDance 實現超過 30 倍的加速。我們公開程式碼與模型以促進自迴歸基礎模型的後續研究。程式碼與模型發佈於:https://github.com/shallowdream204/BitDance。
English
We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to 2^{256} states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using 5.4x fewer parameters (260M) and achieving 8.7x speedup. For text-to-image generation, BitDance trains on large-scale multimodal tokens and generates high-resolution, photorealistic images efficiently, showing strong performance and favorable scaling. When generating 1024x1024 images, BitDance achieves a speedup of over 30x compared to prior AR models. We release code and models to facilitate further research on AR foundation models. Code and models are available at: https://github.com/shallowdream204/BitDance.
PDF203February 18, 2026