オープンソースの大規模ビデオ生成モデル「Open-Sora Plan」

要旨

Open-Sora Planは、さまざまなユーザー入力に基づいて所望の高解像度ビデオを生成するための大規模な生成モデルに貢献することを目的としたオープンソースプロジェクトを紹介します。当プロジェクトは、Wavelet-Flow変分オートエンコーダ、Joint Image-Video Skiparse Denoiser、およびさまざまな条件コントローラを含む、ビデオ生成プロセス全体の複数のコンポーネントで構成されています。さらに、効率的なトレーニングと推論のための多くの補助戦略が設計されており、所望の高品質データを取得するための多次元データキュレーションパイプラインが提案されています。効率的な考え方から利益を得て、Open-Sora Planは定性的および定量的評価の両方で印象的なビデオ生成結果を達成しています。慎重な設計と実践的な経験がビデオ生成研究コミュニティにインスピレーションを与えることを願っています。当プロジェクトのすべてのコードとモデルの重みは、https://github.com/PKU-YuanGroup/Open-Sora-Plan で公開されています。

English

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at https://github.com/PKU-YuanGroup/Open-Sora-Plan.

オープンソースの大規模ビデオ生成モデル「Open-Sora Plan」

Open-Sora Plan: Open-Source Large Video Generation Model

要旨

Support