SiT：使用可扩展的插值变换器探索基于流和扩散的生成模型

摘要

我们提出了可扩展插值变换器（SiT），这是建立在扩散变换器（DiT）骨干上的一类生成模型。插值框架允许以比标准扩散模型更灵活的方式连接两个分布，这使得可以对建立在动态传输基础上的生成模型的各种设计选择进行模块化研究：使用离散还是连续时间学习，决定模型学习的目标，选择连接分布的插值器，以及部署确定性或随机采样器。通过精心引入上述要素，SiT在条件ImageNet 256x256基准测试中，使用完全相同的骨干、参数数量和GFLOPs，能够在各种模型规模上均优于DiT。通过探索各种扩散系数，这些系数可以与学习分开调节，SiT实现了2.06的FID-50K分数。

English

We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: using discrete vs. continuous time learning, deciding the objective for the model to learn, choosing the interpolant connecting the distributions, and deploying a deterministic or stochastic sampler. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256 benchmark using the exact same backbone, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06.

SiT：使用可扩展的插值变换器探索基于流和扩散的生成模型

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

摘要

Support