SiT: スケーラブルな補間変換器を用いたフローおよび拡散ベースの生成モデルの探求

要旨

我々は、Diffusion Transformers（DiT）を基盤とした生成モデルファミリーであるScalable Interpolant Transformers（SiT）を提案する。標準的な拡散モデルよりも柔軟な方法で2つの分布を接続する補間フレームワークにより、動的輸送に基づく生成モデルに影響を与えるさまざまな設計選択をモジュール的に研究することが可能となる。これには、離散時間学習と連続時間学習の使用、モデルが学習する目的関数の決定、分布を接続する補間関数の選択、そして決定論的または確率的サンプラーの展開が含まれる。上記の要素を慎重に導入することで、SiTは条件付きImageNet 256x256ベンチマークにおいて、同じバックボーン、パラメータ数、GFLOPsを使用しながら、モデルサイズにわたって一貫してDiTを上回る性能を発揮する。学習とは別に調整可能なさまざまな拡散係数を探索することで、SiTはFID-50Kスコア2.06を達成した。

English

We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: using discrete vs. continuous time learning, deciding the objective for the model to learn, choosing the interpolant connecting the distributions, and deploying a deterministic or stochastic sampler. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256 benchmark using the exact same backbone, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06.

SiT: スケーラブルな補間変換器を用いたフローおよび拡散ベースの生成モデルの探求

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

要旨

Support