稳定一致性调整:理解和改进一致性模型
Stable Consistency Tuning: Understanding and Improving Consistency Models
October 24, 2024
作者: Fu-Yun Wang, Zhengyang Geng, Hongsheng Li
cs.AI
摘要
扩散模型在生成质量上表现出色,但由于去噪的迭代性质,生成速度较慢。相比之下,一种新的生成模型家族——一致性模型,实现了具有竞争力的性能,并且采样速度显著更快。这些模型通过一致性蒸馏或直接从原始数据进行一致性训练/调整来进行训练。在这项工作中,我们提出了一个新颖的框架,通过将扩散模型的去噪过程建模为马尔可夫决策过程(MDP),并将一致性模型训练构建为通过时间差分(TD)学习进行值估计来理解一致性模型。更重要的是,这个框架使我们能够分析当前一致性训练/调整策略的局限性。基于Easy Consistency Tuning(ECT),我们提出了Stable Consistency Tuning(SCT),它结合了使用评分标识进行方差减少学习。在CIFAR-10和ImageNet-64等基准测试中,SCT实现了显著的性能改进。在ImageNet-64上,SCT实现了1步FID 2.42和2步FID 1.55,成为一致性模型的新SoTA。
English
Diffusion models achieve superior generation quality but suffer from slow
generation speed due to the iterative nature of denoising. In contrast,
consistency models, a new generative family, achieve competitive performance
with significantly faster sampling. These models are trained either through
consistency distillation, which leverages pretrained diffusion models, or
consistency training/tuning directly from raw data. In this work, we propose a
novel framework for understanding consistency models by modeling the denoising
process of the diffusion model as a Markov Decision Process (MDP) and framing
consistency model training as the value estimation through Temporal
Difference~(TD) Learning. More importantly, this framework allows us to analyze
the limitations of current consistency training/tuning strategies. Built upon
Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT),
which incorporates variance-reduced learning using the score identity. SCT
leads to significant performance improvements on benchmarks such as CIFAR-10
and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID
1.55, a new SoTA for consistency models.Summary
AI-Generated Summary