超越單元標記:基於離散最大均值差異的離散擴散模型蒸餾法
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
March 20, 2026
作者: Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans
cs.AI
摘要
目前,离散扩散模型的蒸馏仍存在困难。相比之下,连续扩散模型领域已有多种蒸馏方法,能将采样步骤缩减至个位数。我们提出的离散矩匹配蒸馏法(D-MMD)借鉴了连续域中极为成功的思路。在先前离散蒸馏方法失效的情况下,D-MMD仍能保持高质量和多样性(在采样步骤充足时)。这一优势在文本和图像数据集上均得到验证。此外,新蒸馏出的生成器性能甚至能超越原始教师模型。
English
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful.
Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.