BitDance:基于二进制令牌的自回归生成模型规模化方法
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
February 15, 2026
作者: Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Huaibo Huang, Xiangyu Yue, Hao Chen
cs.AI
摘要
我们提出BitDance——一种可扩展的自回归图像生成器,其通过预测二元视觉标记而非码本索引实现图像生成。借助高熵二元潜变量,BitDance使每个标记可表征多达2^{256}种状态,形成紧凑且高表达力的离散表示。传统分类方法难以从如此巨大的标记空间采样。为此,BitDance采用二元扩散头架构:摒弃基于softmax的索引预测,转而通过连续空间扩散生成二元标记。此外,我们提出下一图像块扩散技术,这种新型解码方法能以高精度并行预测多个标记,大幅加速推理过程。在ImageNet 256×256数据集上,BitDance实现了1.24的FID分数,成为自回归模型中的最佳结果。结合下一图像块扩散技术,BitDance以仅2.6亿参数(减少5.4倍)实现8.7倍加速,性能超越使用14亿参数的顶尖并行自回归模型。在文生图任务中,BitDance通过大规模多模态标记训练,能高效生成高分辨率逼真图像,展现出卓越性能与良好扩展性。生成1024×1024图像时,相较现有自回归模型提速超过30倍。我们公开代码与模型以促进自回归基础模型的深入研究。代码与模型详见:https://github.com/shallowdream204/BitDance。
English
We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to 2^{256} states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using 5.4x fewer parameters (260M) and achieving 8.7x speedup. For text-to-image generation, BitDance trains on large-scale multimodal tokens and generates high-resolution, photorealistic images efficiently, showing strong performance and favorable scaling. When generating 1024x1024 images, BitDance achieves a speedup of over 30x compared to prior AR models. We release code and models to facilitate further research on AR foundation models. Code and models are available at: https://github.com/shallowdream204/BitDance.