LLaDA2.1:基於詞元編輯的文本擴散加速方法
LLaDA2.1: Speeding Up Text Diffusion via Token Editing
February 9, 2026
作者: Tiwei Bie, Maosong Cao, Xiang Cao, Bingsen Chen, Fuyuan Chen, Kun Chen, Lun Du, Daozhuo Feng, Haibo Feng, Mingliang Gong, Zhuocheng Gong, Yanmei Gu, Jian Guan, Kaiyuan Guan, Hongliang He, Zenan Huang, Juyong Jiang, Zhonghui Jiang, Zhenzhong Lan, Chengxi Li, Jianguo Li, Zehuan Li, Huabin Liu, Lin Liu, Guoshan Lu, Yuan Lu, Yuxin Ma, Xingyu Mou, Zhenxuan Pan, Kaida Qiu, Yuji Ren, Jianfeng Tan, Yiding Tian, Zian Wang, Lanning Wei, Tao Wu, Yipeng Xing, Wentao Ye, Liangyu Zha, Tianze Zhang, Xiaolu Zhang, Junbo Zhao, Da Zheng, Hao Zhong, Wanli Zhong, Jun Zhou, Junlin Zhou, Liwang Zhu, Muzhi Zhu, Yihong Zhuang
cs.AI
摘要
雖然LLaDA2.0展現了百億級塊擴散模型的擴展潛力及其內在並行化優勢,解碼速度與生成品質間的微妙平衡始終是難以突破的邊界。今日我們推出顛覆性的LLaDA2.1,透過將Token-to-Token(T2T)編輯無縫編織至傳統Mask-to-Token(M2T)架構中,開創可配置的聯合閾值解碼機制。此結構性革新催生兩種運作模式:迅捷模式(S模式)大膽降低M2T閾值以突破傳統限制,同時依賴T2T進行輸出優化;品質模式(Q模式)則採用保守閾值,在可控效率損耗下確保卓越的基準表現。更進一步,基於擴展上下文視窗,我們首度實現專為擴散大語言模型設計的大規模強化學習框架,並以穩定梯度估計技術為錨點。此對齊機制不僅銳化推理精度,更提升指令遵循的忠實度,彌合擴散動力學與複雜人類意圖間的鴻溝。作為成果結晶,我們同步發布LLaDA2.1-Mini(160億參數)與LLaDA2.1-Flash(1000億參數)。在33項嚴苛基準測試中,LLaDA2.1展現強勁任務性能與閃電級解碼速度——即便作為千億級模型,其在HumanEval+編程任務達成892 TPS,BigCodeBench達801 TPS,LiveCodeBench更創下663 TPS的驚人表現。
English
While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.