CFG-Zero*:面向流匹配模型的改进型无分类器引导方法
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
March 24, 2025
作者: Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, Ziwei Liu
cs.AI
摘要
无分类器引导(Classifier-Free Guidance, CFG)是扩散/流模型中广泛采用的一项技术,旨在提升图像保真度与可控性。本研究首先对CFG在基于高斯混合训练的流匹配模型中的影响进行了理论分析,其中真实流可被精确推导。我们观察到,在训练初期,当流估计尚不准确时,CFG会将样本引向错误的轨迹。基于这一发现,我们提出了CFG-Zero*,一种改进的CFG方法,包含两项创新:(a) 优化尺度,通过优化一个标量来校正速度估计中的误差,故名称中带有*;(b) 零初始化,即在ODE求解器的前几步中置零。在文本到图像(Lumina-Next、Stable Diffusion 3及Flux)和文本到视频(Wan-2.1)生成任务上的实验表明,CFG-Zero*始终优于CFG,凸显了其在引导流匹配模型方面的有效性。(代码已发布于github.com/WeichenFan/CFG-Zero-star)
English
Classifier-Free Guidance (CFG) is a widely adopted technique in
diffusion/flow models to improve image fidelity and controllability. In this
work, we first analytically study the effect of CFG on flow matching models
trained on Gaussian mixtures where the ground-truth flow can be derived. We
observe that in the early stages of training, when the flow estimation is
inaccurate, CFG directs samples toward incorrect trajectories. Building on this
observation, we propose CFG-Zero*, an improved CFG with two contributions: (a)
optimized scale, where a scalar is optimized to correct for the inaccuracies in
the estimated velocity, hence the * in the name; and (b) zero-init, which
involves zeroing out the first few steps of the ODE solver. Experiments on both
text-to-image (Lumina-Next, Stable Diffusion 3, and Flux) and text-to-video
(Wan-2.1) generation demonstrate that CFG-Zero* consistently outperforms CFG,
highlighting its effectiveness in guiding Flow Matching models. (Code is
available at github.com/WeichenFan/CFG-Zero-star)Summary
AI-Generated Summary