分布式回溯为一步扩散蒸馏构建了更快的收敛轨迹。
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation
August 28, 2024
作者: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun
cs.AI
摘要
加速扩散模型采样速度仍然是一个重要挑战。最近的得分蒸馏方法将一个庞大的教师模型蒸馏成一个一步生成器学生模型,通过计算学生模型生成的样本上两个得分函数之间的差异来优化该模型。然而,在蒸馏过程的早期阶段存在得分不匹配问题,因为现有方法主要集中在将预训练扩散模型的端点用作教师模型,忽视了学生生成器与教师模型之间的收敛轨迹的重要性。为了解决这个问题,我们通过引入教师模型的整个收敛轨迹扩展了得分蒸馏过程,并提出了分布回溯蒸馏(DisBack)用于蒸馏学生生成器。DisBack包括两个阶段:退化记录和分布回溯。退化记录旨在获得教师模型的收敛轨迹,记录了从训练有素的教师模型到未经训练的初始学生生成器的退化路径。该退化路径隐含地表示了教师模型的中间分布。然后,分布回溯训练一个学生生成器来回溯中间分布,以逼近教师模型的收敛轨迹。大量实验证明,DisBack比现有的蒸馏方法实现了更快更好的收敛,并实现了可比的生成性能。值得注意的是,DisBack易于实现,并且可以推广到现有的蒸馏方法以提升性能。我们的代码可以在https://github.com/SYZhang0805/DisBack 上公开获取。
English
Accelerating the sampling speed of diffusion models remains a significant
challenge. Recent score distillation methods distill a heavy teacher model into
an one-step student generator, which is optimized by calculating the difference
between the two score functions on the samples generated by the student model.
However, there is a score mismatch issue in the early stage of the distillation
process, because existing methods mainly focus on using the endpoint of
pre-trained diffusion models as teacher models, overlooking the importance of
the convergence trajectory between the student generator and the teacher model.
To address this issue, we extend the score distillation process by introducing
the entire convergence trajectory of teacher models and propose Distribution
Backtracking Distillation (DisBack) for distilling student generators. DisBask
is composed of two stages: Degradation Recording and Distribution Backtracking.
Degradation Recording is designed to obtain the convergence trajectory of
teacher models, which records the degradation path from the trained teacher
model to the untrained initial student generator. The degradation path
implicitly represents the intermediate distributions of teacher models. Then
Distribution Backtracking trains a student generator to backtrack the
intermediate distributions for approximating the convergence trajectory of
teacher models. Extensive experiments show that DisBack achieves faster and
better convergence than the existing distillation method and accomplishes
comparable generation performance. Notably, DisBack is easy to implement and
can be generalized to existing distillation methods to boost performance. Our
code is publicly available on https://github.com/SYZhang0805/DisBack.Summary
AI-Generated Summary