ChatPaper.aiChatPaper

LLaDA2.1:通过标记编辑加速文本扩散

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

February 9, 2026
作者: Tiwei Bie, Maosong Cao, Xiang Cao, Bingsen Chen, Fuyuan Chen, Kun Chen, Lun Du, Daozhuo Feng, Haibo Feng, Mingliang Gong, Zhuocheng Gong, Yanmei Gu, Jian Guan, Kaiyuan Guan, Hongliang He, Zenan Huang, Juyong Jiang, Zhonghui Jiang, Zhenzhong Lan, Chengxi Li, Jianguo Li, Zehuan Li, Huabin Liu, Lin Liu, Guoshan Lu, Yuan Lu, Yuxin Ma, Xingyu Mou, Zhenxuan Pan, Kaida Qiu, Yuji Ren, Jianfeng Tan, Yiding Tian, Zian Wang, Lanning Wei, Tao Wu, Yipeng Xing, Wentao Ye, Liangyu Zha, Tianze Zhang, Xiaolu Zhang, Junbo Zhao, Da Zheng, Hao Zhong, Wanli Zhong, Jun Zhou, Junlin Zhou, Liwang Zhu, Muzhi Zhu, Yihong Zhuang
cs.AI

摘要

尽管LLaDA2.0展现了百亿级块扩散模型的扩展潜力及其固有并行性,但解码速度与生成质量之间的微妙平衡始终是难以突破的边界。今日我们推出LLaDA2.1,这一范式革新旨在超越此权衡困境。通过将Token到Token编辑机制无缝融入传统Mask到Token框架,我们引入了可配置的联合阈值解码方案。该结构创新催生两种独特模式:迅捷模式大胆降低M2T阈值以突破传统限制,同时依托T2T机制优化输出;品质模式则采用保守阈值策略,在可控效率损耗下确保卓越的基准性能。基于扩展上下文窗口的支撑,我们进一步实现了首个专为扩散大模型设计的大规模强化学习框架,并通过稳定梯度估计技术夯实其基础。这种对齐机制不仅锐化了推理精度,更提升了指令遵循的忠实度,弥合了扩散动力学与复杂人类意图之间的鸿沟。我们同步开源LLaDA2.1-Mini和LLaDA2.1-Flash两个版本。在33项严苛基准测试中,LLaDA2.1展现出强劲的任务性能与闪电级解码速度——即便作为百亿参数模型,其在HumanEval+上达到892 TPS,BigCodeBench达801 TPS,LiveCodeBench更实现663 TPS的惊人代码生成效率。
English
While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.
PDF544February 11, 2026