對齊你的流動：擴展連續時間流圖蒸餾

摘要

擴散模型和流模型已成為最先進的生成建模方法，但它們需要多次採樣步驟。一致性模型可以將這些模型蒸餾成高效的一步生成器；然而，與基於流和擴散的方法不同，當增加步驟數時，其性能不可避免地下降，這一點我們在理論和實驗中都進行了展示。流映射通過在單一步驟中連接任意兩個噪聲水平來推廣這些方法，並在所有步驟數下保持有效。在本文中，我們引入了兩種新的連續時間目標函數來訓練流映射，並提出了額外的創新訓練技術，從而推廣了現有的一致性和流匹配目標。我們進一步證明，自動引導可以提升性能，即在蒸餾過程中使用低質量模型進行引導，並且通過對抗性微調可以實現額外的性能提升，同時樣本多樣性的損失最小。我們廣泛驗證了我們的流映射模型（稱為Align Your Flow），在具有挑戰性的圖像生成基準測試中取得了最先進的少步生成性能，無論是在ImageNet 64x64還是512x512上，均使用了小型且高效的神經網絡。最後，我們展示了文本到圖像的流映射模型，在文本條件合成中超越了所有現有的非對抗性訓練的少步採樣器。

English

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

對齊你的流動：擴展連續時間流圖蒸餾

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

摘要

Support