ChatPaper.aiChatPaper

T3D:基于轨迹自蒸餾與直接判別性優化的少步擴散語言模型

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

February 12, 2026
作者: Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas
cs.AI

摘要

扩散式大语言模型(DLLMs)具备通过并行解码多个标记实现快速文本生成的潜力。然而在实际应用中,其推理效率受限于大量细化步骤的需求,而过度减少步骤数会导致生成质量显著下降。为缓解此问题,我们提出了一种轨迹自蒸馏框架,通过蒸馏模型自身的生成轨迹来改进少步数解码。我们引入直接判别优化(DDO)这一反向KL目标函数,该函数支持模式寻求式蒸馏,并促使学生模型聚焦于教师模型的高概率模式。在多项基准测试中,我们的方法在严格步数预算下持续优于强少步基线及标准训练方案。尽管全步数解码仍具优势,但我们显著缩小了性能差距,为实用型少步DLLMs奠定了坚实基础。源代码已发布于https://github.com/Tyrion58/T3D。
English
Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that promotes mode-seeking distillation and encourages the student to concentrate on high-probability teacher modes. Across benchmarks, our approach consistently outperforms strong few-step baselines and standard training under tight step budgets. Although full-step decoding remains superior, we substantially narrow the gap, establishing a strong foundation towards practical few-step DLLMs. The source code is available at https://github.com/Tyrion58/T3D.
PDF61February 14, 2026