您的學生表現優於預期：適應性師生合作的文字條件擴散模型

摘要

最近，知識蒸餾方法已被證明是一個有前途的方向，可以加快大規模擴散模型的合成速度，只需進行少量的推論步驟。雖然最近提出了幾種強大的蒸餾方法，但通常學生樣本的整體質量通常比老師的要低，這限制了它們的實際應用。在這項工作中，我們調查了老師文本到圖像擴散模型及其蒸餾學生版本所產生樣本的相對質量。作為我們的主要實證發現，我們發現相當一部分學生樣本在忠實度上優於老師的樣本，儘管學生的性質是“近似”的。基於這一發現，我們提出了一種適應性的學生和老師擴散模型之間的協作，用於有效的文本到圖像合成。具體來說，蒸餾模型生成初始樣本，然後一個神諭判斷是否需要通過一個緩慢的老師模型進行進一步改進。廣泛的實驗表明，所設計的流程在人類偏好方面超越了各種推論預算的最先進文本到圖像替代方案。此外，所提出的方法可以自然地應用於流行應用，如文本引導的圖像編輯和可控生成。

English

Knowledge distillation methods have recently shown to be a promising direction to speedup the synthesis of large-scale diffusion models by requiring only a few inference steps. While several powerful distillation methods were recently proposed, the overall quality of student samples is typically lower compared to the teacher ones, which hinders their practical usage. In this work, we investigate the relative quality of samples produced by the teacher text-to-image diffusion model and its distilled student version. As our main empirical finding, we discover that a noticeable portion of student samples exhibit superior fidelity compared to the teacher ones, despite the ``approximate'' nature of the student. Based on this finding, we propose an adaptive collaboration between student and teacher diffusion models for effective text-to-image synthesis. Specifically, the distilled model produces the initial sample, and then an oracle decides whether it needs further improvements with a slow teacher model. Extensive experiments demonstrate that the designed pipeline surpasses state-of-the-art text-to-image alternatives for various inference budgets in terms of human preference. Furthermore, the proposed approach can be naturally used in popular applications such as text-guided image editing and controllable generation.

您的學生表現優於預期：適應性師生合作的文字條件擴散模型

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

摘要

Support