基于加权h变换采样的粗引导视觉生成

摘要

粗粒度引导的视觉生成技术通过从退化或低保真度的粗略参考中合成精细视觉样本，在各种现实应用中具有关键意义。尽管基于训练的方法效果显著，但其固有局限性在于高昂的训练成本以及配对数据收集导致的泛化能力受限。为此，近期无训练方法提出利用预训练扩散模型，在采样过程中引入引导机制。然而，这些方法要么需要已知前向（精细到粗略）变换算子（如双三次下采样），要么难以在引导效果与合成质量之间取得平衡。为应对这些挑战，我们提出一种基于h变换的新型引导方法——该数学工具能够将随机过程（如采样过程）约束于理想条件下。具体而言，我们通过在原微分方程中添加漂移函数来修正每个采样步长的转移概率，从而近似地将生成过程导向理想精细样本。针对不可避免的近似误差，我们引入噪声水平感知调度机制，随着误差增大逐步降低该项的权重，确保引导依从性与高质量合成的统一。在多样化图像与视频生成任务上的大量实验证明了本方法的有效性和泛化能力。

English

Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward (fine-to-coarse) transformation operator, e.g., bicubic downsampling, or are difficult to balance between guidance and synthetic quality. To address these challenges, we propose a novel guided method by using the h-transform, a tool that can constrain stochastic processes (e.g., sampling process) under desired conditions. Specifically, we modify the transition probability at each sampling timestep by adding to the original differential equation with a drift function, which approximately steers the generation toward the ideal fine sample. To address unavoidable approximation errors, we introduce a noise-level-aware schedule that gradually de-weights the term as the error increases, ensuring both guidance adherence and high-quality synthesis. Extensive experiments across diverse image and video generation tasks demonstrate the effectiveness and generalization of our method.

基于加权h变换采样的粗引导视觉生成

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

摘要

Support