高斯混合流匹配模型

摘要

擴散模型將去噪分佈近似為高斯分佈並預測其均值，而流匹配模型則將高斯均值重新參數化為流速度。然而，由於離散化誤差，它們在少步採樣中表現不佳，並且在無分類器指導（CFG）下容易產生過飽和的顏色。為了解決這些限制，我們提出了一種新穎的高斯混合流匹配（GMFlow）模型：GMFlow不預測均值，而是預測動態高斯混合（GM）參數，以捕捉多模態的流速度分佈，並可以使用KL散度損失進行學習。我們證明，GMFlow推廣了先前的擴散和流匹配模型，這些模型使用L_2去噪損失學習單一高斯分佈。對於推理，我們推導了GM-SDE/ODE求解器，這些求解器利用解析去噪分佈和速度場進行精確的少步採樣。此外，我們引入了一種新穎的概率指導方案，該方案緩解了CFG的過飽和問題，並提高了圖像生成質量。大量實驗表明，GMFlow在生成質量上始終優於流匹配基線，在ImageNet 256×256上僅需6個採樣步驟即可達到0.942的精確度。

English

Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an L_2 denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256times256.

高斯混合流匹配模型

Gaussian Mixture Flow Matching Models

摘要

Support