ChatPaper.aiChatPaper

S^2-引导:用于无训练增强扩散模型的随机自引导方法

S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

August 18, 2025
作者: Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Xiu Li
cs.AI

摘要

无分类器引导(Classifier-free Guidance, CFG)是现代扩散模型中广泛采用的一项技术,旨在提升样本质量与提示遵循度。然而,通过对具有闭式解的高斯混合模型进行实证分析,我们观察到CFG产生的次优结果与真实情况之间存在偏差。模型对这些次优预测的过度依赖,常常导致语义不连贯及输出质量低下。为解决这一问题,我们首先通过实验证明,利用模型自身的子网络可有效精炼这些次优预测。基于这一发现,我们提出了S^2-引导,一种创新方法,该方法在前向过程中采用随机块丢弃策略构建随机子网络,有效引导模型避开潜在的低质量预测,朝向高质量输出迈进。在文本到图像及文本到视频生成任务上的大量定性与定量实验表明,S^2-引导展现出卓越性能,持续超越CFG及其他先进引导策略。我们的代码将予以公开。
English
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.
PDF452August 19, 2025