MixFlow：混合源分布优化整流流模型

摘要

扩散模型及其变体（如整流流）能够生成多样化且高质量的图像，但因其学习到的高度弯曲生成路径导致的迭代采样速度缓慢问题仍待解决。先前研究表明，高曲率的重要成因在于源分布（标准高斯分布）与数据分布之间的独立性。本研究通过两项互补性贡献突破这一局限：首先提出κ-FC通用框架，通过引入与数据分布更匹配的任意信号κ来打破标准高斯假设；随后推出MixFlow训练策略——该策略通过在固定无条件分布与基于κ-FC的分布之间进行线性混合来训练流模型，有效降低生成路径曲率并显著提升采样效率。这种混合方法不仅增强了源分布与数据分布的匹配度，还能以更少采样步骤获得更优生成质量，并大幅加速训练收敛。在固定采样预算下，我们的训练方案相比标准整流流将FID指标平均提升12%，较现有基线方法提升7%。代码详见：https://github.com/NazirNayal8/MixFlow

English

Diffusion models and their variations, such as rectified flows, generate diverse and high-quality images, but they are still hindered by slow iterative sampling caused by the highly curved generative paths they learn. An important cause of high curvature, as shown by previous work, is independence between the source distribution (standard Gaussian) and the data distribution. In this work, we tackle this limitation by two complementary contributions. First, we attempt to break away from the standard Gaussian assumption by introducing κ-FC, a general formulation that conditions the source distribution on an arbitrary signal κ that aligns it better with the data distribution. Then, we present MixFlow, a simple but effective training strategy that reduces the generative path curvatures and considerably improves sampling efficiency. MixFlow trains a flow model on linear mixtures of a fixed unconditional distribution and a κ-FC-based distribution. This simple mixture improves the alignment between the source and data, provides better generation quality with less required sampling steps, and accelerates the training convergence considerably. On average, our training procedure improves the generation quality by 12\% in FID compared to standard rectified flow and 7\% compared to previous baselines under a fixed sampling budget. Code available at: https://github.com/NazirNayal8/MixFlow{https://github.com/NazirNayal8/MixFlow}