大きければ良いとは限らない：潜在拡散モデルのスケーリング特性

要旨

潜在拡散モデル（LDMs）のスケーリング特性について、特にそのサンプリング効率に焦点を当てて研究を行いました。ネットワークアーキテクチャや推論アルゴリズムの改善が拡散モデルのサンプリング効率を効果的に向上させることが示されていますが、サンプリング効率の重要な決定要因であるモデルサイズの役割は十分に検証されていませんでした。確立されたテキストから画像への拡散モデルを実証的に分析し、モデルサイズがさまざまなサンプリングステップにわたってサンプリング効率にどのように影響するかを詳細に調査しました。その結果、驚くべき傾向が明らかになりました：与えられた推論予算の下で動作する場合、より小さなモデルがより大きな同等モデルを上回り、高品質な結果を生成することが頻繁に観察されたのです。さらに、この知見の一般化可能性を実証するために、さまざまな拡散サンプラーを適用し、多様な下流タスクを探索し、蒸留後のモデルを評価し、トレーニング計算量に対する相対的な性能を比較するなど、研究を拡張しました。これらの発見は、限られた推論予算内で生成能力を向上させるために活用できるLDMスケーリング戦略の開発に向けた新たな道筋を開くものです。

English

We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established text-to-image diffusion models, we conduct an in-depth investigation into how model size influences sampling efficiency across varying sampling steps. Our findings unveil a surprising trend: when operating under a given inference budget, smaller models frequently outperform their larger equivalents in generating high-quality results. Moreover, we extend our study to demonstrate the generalizability of the these findings by applying various diffusion samplers, exploring diverse downstream tasks, evaluating post-distilled models, as well as comparing performance relative to training compute. These findings open up new pathways for the development of LDM scaling strategies which can be employed to enhance generative capabilities within limited inference budgets.

大きければ良いとは限らない：潜在拡散モデルのスケーリング特性

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

要旨

Support