効率的な生成モデル訓練のための埋め込み表現ウォームアップ

要旨

拡散モデルは高次元データの生成に優れているものの、自己教師あり手法と比較して、学習効率と表現品質の面で劣っています。本研究では、その主要なボトルネックとして、学習中に高品質で意味的に豊かな表現が十分に活用されないことが収束を著しく遅らせていることを明らかにしました。体系的な分析を通じて、生成が行われる前に意味的および構造的なパターン学習が行われる重要な表現処理領域（主に初期層）を特定しました。この課題に対処するため、Embedded Representation Warmup（ERW）というプラグアンドプレイ型のフレームワークを提案します。このフレームワークでは、第一段階としてERWモジュールがウォームアップとして機能し、拡散モデルの初期層を高品質な事前学習済み表現で初期化します。このウォームアップにより、表現をゼロから学習する負担が軽減され、収束が加速され、性能が向上します。理論的分析により、ERWの有効性は、モデルが後の生成のために主に特徴表現を処理・変換する特定のニューラルネットワーク層（表現処理領域と呼ばれる）への正確な統合に依存することが示されました。さらに、ERWが学習収束を加速するだけでなく、表現品質も向上させることを実証しました。実験的には、本手法は現状の最先端手法であるREPAと比較して、学習速度を40倍加速させることに成功しています。コードはhttps://github.com/LINs-lab/ERWで公開されています。

English

Diffusion models excel at generating high-dimensional data but fall short in training efficiency and representation quality compared to self-supervised methods. We identify a key bottleneck: the underutilization of high-quality, semantically rich representations during training notably slows down convergence. Our systematic analysis reveals a critical representation processing region -- primarily in the early layers -- where semantic and structural pattern learning takes place before generation can occur. To address this, we propose Embedded Representation Warmup (ERW), a plug-and-play framework where in the first stage we get the ERW module serves as a warmup that initializes the early layers of the diffusion model with high-quality, pretrained representations. This warmup minimizes the burden of learning representations from scratch, thereby accelerating convergence and boosting performance. Our theoretical analysis demonstrates that ERW's efficacy depends on its precise integration into specific neural network layers -- termed the representation processing region -- where the model primarily processes and transforms feature representations for later generation. We further establish that ERW not only accelerates training convergence but also enhances representation quality: empirically, our method achieves a 40times acceleration in training speed compared to REPA, the current state-of-the-art methods. Code is available at https://github.com/LINs-lab/ERW.

効率的な生成モデル訓練のための埋め込み表現ウォームアップ

Efficient Generative Model Training via Embedded Representation Warmup

要旨

Support