ChatPaper.aiChatPaper

连续时间一致性模型的简化、稳定化和扩展

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

October 14, 2024
作者: Cheng Lu, Yang Song
cs.AI

摘要

一致性模型(CMs)是一类基于扩散的生成模型,经过优化以实现快速抽样。大多数现有的一致性模型是使用离散化时间步进行训练的,这会引入额外的超参数并容易出现离散化误差。虽然连续时间制定可以缓解这些问题,但由于训练不稳定,其成功受到限制。为了解决这个问题,我们提出了一个简化的理论框架,统一了以前扩散模型和一致性模型的参数化,识别了不稳定性的根本原因。基于这一分析,我们在扩散过程参数化、网络架构和训练目标方面引入了关键改进。这些改变使我们能够在前所未有的规模上训练连续时间一致性模型,在ImageNet 512x512上达到了15亿参数。我们提出的训练算法仅使用两个抽样步骤,在CIFAR-10上实现了2.06的FID分数,在ImageNet 64x64上实现了1.48,在ImageNet 512x512上实现了1.88,将FID分数与最佳现有扩散模型的差距缩小到不到10%。
English
Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce additional hyperparameters and are prone to discretization errors. While continuous-time formulations can mitigate these issues, their success has been limited by training instability. To address this, we propose a simplified theoretical framework that unifies previous parameterizations of diffusion models and CMs, identifying the root causes of instability. Based on this analysis, we introduce key improvements in diffusion process parameterization, network architecture, and training objectives. These changes enable us to train continuous-time CMs at an unprecedented scale, reaching 1.5B parameters on ImageNet 512x512. Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on ImageNet 64x64, and 1.88 on ImageNet 512x512, narrowing the gap in FID scores with the best existing diffusion models to within 10%.

Summary

AI-Generated Summary

PDF193November 16, 2024