CFG-Zero*：面向流匹配模型的改进型无分类器引导方法

摘要

无分类器引导（Classifier-Free Guidance, CFG）是扩散/流模型中广泛采用的一项技术，旨在提升图像保真度与可控性。本研究首先对CFG在基于高斯混合训练的流匹配模型中的影响进行了理论分析，其中真实流可被精确推导。我们观察到，在训练初期，当流估计尚不准确时，CFG会将样本引向错误的轨迹。基于这一发现，我们提出了CFG-Zero*，一种改进的CFG方法，包含两项创新：(a) 优化尺度，通过优化一个标量来校正速度估计中的误差，故名称中带有*；(b) 零初始化，即在ODE求解器的前几步中置零。在文本到图像（Lumina-Next、Stable Diffusion 3及Flux）和文本到视频（Wan-2.1）生成任务上的实验表明，CFG-Zero*始终优于CFG，凸显了其在引导流匹配模型方面的有效性。（代码已发布于github.com/WeichenFan/CFG-Zero-star）

English

Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion/flow models to improve image fidelity and controllability. In this work, we first analytically study the effect of CFG on flow matching models trained on Gaussian mixtures where the ground-truth flow can be derived. We observe that in the early stages of training, when the flow estimation is inaccurate, CFG directs samples toward incorrect trajectories. Building on this observation, we propose CFG-Zero*, an improved CFG with two contributions: (a) optimized scale, where a scalar is optimized to correct for the inaccuracies in the estimated velocity, hence the * in the name; and (b) zero-init, which involves zeroing out the first few steps of the ODE solver. Experiments on both text-to-image (Lumina-Next, Stable Diffusion 3, and Flux) and text-to-video (Wan-2.1) generation demonstrate that CFG-Zero* consistently outperforms CFG, highlighting its effectiveness in guiding Flow Matching models. (Code is available at github.com/WeichenFan/CFG-Zero-star)

CFG-Zero*：面向流匹配模型的改进型无分类器引导方法

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

摘要

Support