効率的なビデオ生成モデリングのためのピラミッドフローマッチング

要旨

ビデオ生成には、膨大な時空間をモデリングする必要があり、それには大規模な計算リソースとデータ使用量が必要です。複雑さを軽減するために、従来のアプローチでは、完全な解像度での直接トレーニングを避けるためにカスケードアーキテクチャが採用されています。計算要件を削減するものの、各サブステージの別々の最適化は知識共有を妨げ、柔軟性を犠牲にしています。この研究では、統一された金字塔フローマッチングアルゴリズムを紹介しています。これは、元のノイズ除去軌道を金字塔ステージの系列として再解釈し、最終ステージのみが完全な解像度で動作するようにしており、より効率的なビデオ生成モデリングを可能にしています。洗練された設計により、異なる金字塔ステージのフローを相互にリンクさせて連続性を維持することができます。さらに、完全な解像度の履歴を圧縮するために、時系列金字塔を使用した自己回帰ビデオ生成を作成しています。全体のフレームワークは、単一統一Diffusion Transformer（DiT）を用いてエンドツーエンドで最適化できます。幅広い実験により、当社の手法が、768pの解像度で20.7k A100 GPUトレーニング時間以内に、高品質な5秒（最大10秒）のビデオを24 FPSで生成することをサポートしていることが示されています。すべてのコードとモデルは、https://pyramid-flow.github.io でオープンソースとして公開されます。

English

Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage. To reduce the complexity, the prevailing approaches employ a cascaded architecture to avoid direct training with full resolution. Despite reducing computational demands, the separate optimization of each sub-stage hinders knowledge sharing and sacrifices flexibility. This work introduces a unified pyramidal flow matching algorithm. It reinterprets the original denoising trajectory as a series of pyramid stages, where only the final stage operates at the full resolution, thereby enabling more efficient video generative modeling. Through our sophisticated design, the flows of different pyramid stages can be interlinked to maintain continuity. Moreover, we craft autoregressive video generation with a temporal pyramid to compress the full-resolution history. The entire framework can be optimized in an end-to-end manner and with a single unified Diffusion Transformer (DiT). Extensive experiments demonstrate that our method supports generating high-quality 5-second (up to 10-second) videos at 768p resolution and 24 FPS within 20.7k A100 GPU training hours. All code and models will be open-sourced at https://pyramid-flow.github.io.

効率的なビデオ生成モデリングのためのピラミッドフローマッチング

Pyramidal Flow Matching for Efficient Video Generative Modeling

要旨

Support