您的学生比预期的要好：适应性教师-学生协作用于文本条件扩散模型

摘要

最近，知识蒸馏方法显示出成为加速大规模扩散模型合成的有前途方向，只需少量推理步骤。尽管最近提出了几种强大的蒸馏方法，但通常学生样本的整体质量相对较低，与教师样本相比，这阻碍了它们的实际使用。在这项工作中，我们调查了教师文本到图像扩散模型及其蒸馏学生版本生成的样本的相对质量。作为我们的主要经验发现，我们发现相当一部分学生样本在保真度上优于教师样本，尽管学生的“近似”性质。基于这一发现，我们提出了一种自适应的学生和教师扩散模型之间的协作，用于有效的文本到图像合成。具体而言，蒸馏模型生成初始样本，然后一个神谕决定是否需要通过慢教师模型进一步改进。大量实验证明，所设计的流程在人类偏好方面超越了各种推理预算下的最先进文本到图像替代方案。此外，所提出的方法可以自然地用于流行应用，如文本引导的图像编辑和可控生成。

English

Knowledge distillation methods have recently shown to be a promising direction to speedup the synthesis of large-scale diffusion models by requiring only a few inference steps. While several powerful distillation methods were recently proposed, the overall quality of student samples is typically lower compared to the teacher ones, which hinders their practical usage. In this work, we investigate the relative quality of samples produced by the teacher text-to-image diffusion model and its distilled student version. As our main empirical finding, we discover that a noticeable portion of student samples exhibit superior fidelity compared to the teacher ones, despite the ``approximate'' nature of the student. Based on this finding, we propose an adaptive collaboration between student and teacher diffusion models for effective text-to-image synthesis. Specifically, the distilled model produces the initial sample, and then an oracle decides whether it needs further improvements with a slow teacher model. Extensive experiments demonstrate that the designed pipeline surpasses state-of-the-art text-to-image alternatives for various inference budgets in terms of human preference. Furthermore, the proposed approach can be naturally used in popular applications such as text-guided image editing and controllable generation.

您的学生比预期的要好：适应性教师-学生协作用于文本条件扩散模型

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

摘要

Support