AtomoVideo: 高忠実度画像-動画生成

要旨

近年、優れたテキストから画像生成技術を基盤として、ビデオ生成が著しい進展を遂げています。本研究では、画像からビデオを生成するための高忠実度フレームワーク「AtomoVideo」を提案します。マルチグラニュラリティな画像注入を基盤とすることで、生成されたビデオの与えられた画像に対する忠実度を高めています。さらに、高品質なデータセットとトレーニング戦略により、優れた時間的一貫性と安定性を維持しつつ、より大きな動きの強度を実現しています。我々のアーキテクチャは、ビデオフレーム予測タスクに柔軟に拡張可能であり、反復生成を通じて長いシーケンスの予測を可能にします。さらに、アダプタトレーニングの設計により、既存のパーソナライズされたモデルや制御可能なモジュールと良好に組み合わせることができます。定量的および定性的な評価により、AtomoVideoは一般的な手法と比較して優れた結果を達成しており、詳細な例はプロジェクトウェブサイト（https://atomo-video.github.io/）でご覧いただけます。

English

Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training strategies, we achieve greater motion intensity while maintaining superior temporal consistency and stability. Our architecture extends flexibly to the video frame prediction task, enabling long sequence prediction through iterative generation. Furthermore, due to the design of adapter training, our approach can be well combined with existing personalised models and controllable modules. By quantitatively and qualitatively evaluation, AtomoVideo achieves superior results compared to popular methods, more examples can be found on our project website: https://atomo- video.github.io/.

AtomoVideo: 高忠実度画像-動画生成

AtomoVideo: High Fidelity Image-to-Video Generation

要旨

Support