S^2-引导：用于无训练增强扩散模型的随机自引导方法

摘要

无分类器引导（Classifier-free Guidance, CFG）是现代扩散模型中广泛采用的一项技术，旨在提升样本质量与提示遵循度。然而，通过对具有闭式解的高斯混合模型进行实证分析，我们观察到CFG产生的次优结果与真实情况之间存在偏差。模型对这些次优预测的过度依赖，常常导致语义不连贯及输出质量低下。为解决这一问题，我们首先通过实验证明，利用模型自身的子网络可有效精炼这些次优预测。基于这一发现，我们提出了S^2-引导，一种创新方法，该方法在前向过程中采用随机块丢弃策略构建随机子网络，有效引导模型避开潜在的低质量预测，朝向高质量输出迈进。在文本到图像及文本到视频生成任务上的大量定性与定量实验表明，S^2-引导展现出卓越性能，持续超越CFG及其他先进引导策略。我们的代码将予以公开。

English

Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.

S^2-引导：用于无训练增强扩散模型的随机自引导方法

S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

摘要

Support