DiffusionBlocks: スコアベース拡散法による生成モデルのブロック単位トレーニング

要旨

エンドツーエンドの誤差逆伝播法を用いた大規模ニューラルネットワークの学習は、メモリのボトルネックを引き起こし、最先端のAI研究へのアクセスを制限しています。本論文では、DiffusionBlocksという新しい学習フレームワークを提案します。このフレームワークは、ニューラルネットワークのブロックを連続時間拡散過程におけるノイズ除去操作として解釈します。ネットワークを独立して学習可能なブロックに分割し、等しい累積確率質量に基づいてノイズレベル割り当てを最適化することで、生成タスクにおいて従来の誤差逆伝播法と同等の性能を維持しつつ、大幅なメモリ効率を実現します。画像生成と言語モデリングタスクにおける実験では、ブロック数に比例したメモリ削減を達成し、優れた性能を示しています。DiffusionBlocksは、限られた計算リソースで大規模ニューラルネットワークの学習を民主化する有望な道筋を提供します。

English

Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose DiffusionBlocks, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable blocks and optimizing noise level assignments based on equal cumulative probability mass, our approach achieves significant memory efficiency while maintaining competitive performance compared to traditional backpropagation in generative tasks. Experiments on image generation and language modeling tasks demonstrate memory reduction proportional to the number of blocks while achieving superior performance. DiffusionBlocks provides a promising pathway for democratizing access to large-scale neural network training with limited computational resources.

DiffusionBlocks: スコアベース拡散法による生成モデルのブロック単位トレーニング

DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

要旨

Support