T3D:基于轨迹自蒸馏与直接判别优化的少步扩散语言模型
T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
February 12, 2026
作者: Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas
cs.AI
摘要
扩散大语言模型(DLLMs)具备通过并行解码多个标记实现快速文本生成的潜力。然而在实际应用中,其推理效率受限于大量细化步骤的需求,而过度减少步骤数量会导致生成质量显著下降。为缓解此问题,我们提出一种轨迹自蒸馏框架,通过蒸馏模型自身的生成轨迹来改进少步解码。我们引入直接判别优化(DDO)这一反向KL目标函数,该函数支持模式聚焦式蒸馏,促使学生模型聚焦于教师模型的高概率模式。在多项基准测试中,我们的方法在严格步数预算下持续优于强少步基线及标准训练结果。尽管全步解码仍具优势,但我们显著缩小了性能差距,为实用型少步DLLMs奠定了坚实基础。源代码已发布于https://github.com/Tyrion58/T3D。
English
Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that promotes mode-seeking distillation and encourages the student to concentrate on high-probability teacher modes. Across benchmarks, our approach consistently outperforms strong few-step baselines and standard training under tight step budgets. Although full-step decoding remains superior, we substantially narrow the gap, establishing a strong foundation towards practical few-step DLLMs. The source code is available at https://github.com/Tyrion58/T3D.