ChatPaper.aiChatPaper

高斯混合流匹配模型

Gaussian Mixture Flow Matching Models

April 7, 2025
作者: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi
cs.AI

摘要

擴散模型將去噪分佈近似為高斯分佈並預測其均值,而流匹配模型則將高斯均值重新參數化為流速度。然而,由於離散化誤差,它們在少步採樣中表現不佳,並且在無分類器指導(CFG)下容易產生過飽和的顏色。為了解決這些限制,我們提出了一種新穎的高斯混合流匹配(GMFlow)模型:GMFlow不預測均值,而是預測動態高斯混合(GM)參數,以捕捉多模態的流速度分佈,並可以使用KL散度損失進行學習。我們證明,GMFlow推廣了先前的擴散和流匹配模型,這些模型使用L_2去噪損失學習單一高斯分佈。對於推理,我們推導了GM-SDE/ODE求解器,這些求解器利用解析去噪分佈和速度場進行精確的少步採樣。此外,我們引入了一種新穎的概率指導方案,該方案緩解了CFG的過飽和問題,並提高了圖像生成質量。大量實驗表明,GMFlow在生成質量上始終優於流匹配基線,在ImageNet 256×256上僅需6個採樣步驟即可達到0.942的精確度。
English
Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an L_2 denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256times256.

Summary

AI-Generated Summary

PDF122April 8, 2025