对齐你的流：扩展连续时间流图蒸馏

摘要

扩散模型和流模型已成为最先进的生成建模方法，但它们需要大量采样步骤。一致性模型能够将这些模型蒸馏为高效的一步生成器；然而，与基于流和扩散的方法不同，当增加步骤数量时，其性能不可避免地下降，这一点我们通过分析和实验均予以证明。流映射通过一步连接任意两个噪声级别，推广了这些方法，并在所有步骤计数下保持有效。本文中，我们引入了两种新的连续时间目标函数来训练流映射，并提出了额外的创新训练技术，从而推广了现有的一致性和流匹配目标。我们进一步证明，自动引导可以通过在蒸馏过程中使用低质量模型进行指导来提升性能，而通过对抗性微调还能获得额外提升，同时样本多样性损失最小。我们广泛验证了名为“对齐你的流”的流映射模型，在具有挑战性的图像生成基准测试中，使用小型高效的神经网络，在ImageNet 64x64和512x512上实现了最先进的少步生成性能。最后，我们展示了文本到图像的流映射模型，在文本条件合成中超越了所有现有的非对抗性训练的少步采样器。

English

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

对齐你的流：扩展连续时间流图蒸馏

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

摘要

Support