S^2-引导:用于无训练增强扩散模型的随机自引导方法
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
August 18, 2025
作者: Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Xiu Li
cs.AI
摘要
无分类器引导(Classifier-free Guidance, CFG)是现代扩散模型中广泛采用的一项技术,旨在提升样本质量与提示遵循度。然而,通过对具有闭式解的高斯混合模型进行实证分析,我们观察到CFG产生的次优结果与真实情况之间存在偏差。模型对这些次优预测的过度依赖,常常导致语义不连贯及输出质量低下。为解决这一问题,我们首先通过实验证明,利用模型自身的子网络可有效精炼这些次优预测。基于这一发现,我们提出了S^2-引导,一种创新方法,该方法在前向过程中采用随机块丢弃策略构建随机子网络,有效引导模型避开潜在的低质量预测,朝向高质量输出迈进。在文本到图像及文本到视频生成任务上的大量定性与定量实验表明,S^2-引导展现出卓越性能,持续超越CFG及其他先进引导策略。我们的代码将予以公开。
English
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion
models for enhancing sample quality and prompt adherence. However, through an
empirical analysis on Gaussian mixture modeling with a closed-form solution, we
observe a discrepancy between the suboptimal results produced by CFG and the
ground truth. The model's excessive reliance on these suboptimal predictions
often leads to semantic incoherence and low-quality outputs. To address this
issue, we first empirically demonstrate that the model's suboptimal predictions
can be effectively refined using sub-networks of the model itself. Building on
this insight, we propose S^2-Guidance, a novel method that leverages stochastic
block-dropping during the forward process to construct stochastic sub-networks,
effectively guiding the model away from potential low-quality predictions and
toward high-quality outputs. Extensive qualitative and quantitative experiments
on text-to-image and text-to-video generation tasks demonstrate that
S^2-Guidance delivers superior performance, consistently surpassing CFG and
other advanced guidance strategies. Our code will be released.