ChatPaper.aiChatPaper

CFG-Zero*:流匹配模型的改進版無分類器引導

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

March 24, 2025
作者: Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, Ziwei Liu
cs.AI

摘要

無分類器引導(Classifier-Free Guidance, CFG)是擴散/流模型中廣泛採用的技術,旨在提升圖像的逼真度與可控性。在本研究中,我們首先對CFG在基於高斯混合訓練的流匹配模型中的影響進行了理論分析,其中真實流可被推導。我們觀察到,在訓練初期,當流估計不準確時,CFG會將樣本引向錯誤的軌跡。基於這一觀察,我們提出了CFG-Zero*,這是一種改進的CFG,包含兩項貢獻:(a) 優化尺度,即通過優化一個標量來校正速度估計中的誤差,這也是名稱中“*”的由來;(b) 零初始化,即在ODE求解器的前幾步中將值設為零。在文本到圖像(Lumina-Next、Stable Diffusion 3和Flux)以及文本到視頻(Wan-2.1)生成的實驗中,CFG-Zero*均一致性地超越了CFG,證明了其在引導流匹配模型方面的有效性。(代碼可於github.com/WeichenFan/CFG-Zero-star獲取)
English
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion/flow models to improve image fidelity and controllability. In this work, we first analytically study the effect of CFG on flow matching models trained on Gaussian mixtures where the ground-truth flow can be derived. We observe that in the early stages of training, when the flow estimation is inaccurate, CFG directs samples toward incorrect trajectories. Building on this observation, we propose CFG-Zero*, an improved CFG with two contributions: (a) optimized scale, where a scalar is optimized to correct for the inaccuracies in the estimated velocity, hence the * in the name; and (b) zero-init, which involves zeroing out the first few steps of the ODE solver. Experiments on both text-to-image (Lumina-Next, Stable Diffusion 3, and Flux) and text-to-video (Wan-2.1) generation demonstrate that CFG-Zero* consistently outperforms CFG, highlighting its effectiveness in guiding Flow Matching models. (Code is available at github.com/WeichenFan/CFG-Zero-star)

Summary

AI-Generated Summary

PDF212March 25, 2025