AtomoVideo：高保真度圖像到視頻生成

摘要

最近，基於優越的文本到圖像生成技術，視頻生成取得了顯著快速發展。在這項工作中，我們提出了一個名為AtomoVideo的高保真度圖像到視頻生成框架。通過多粒度圖像注入，我們實現了生成視頻與給定圖像更高的保真度。此外，由於高質量的數據集和訓練策略，我們實現了更大的運動強度，同時保持了優越的時間一致性和穩定性。我們的架構靈活擴展到視頻幀預測任務，通過迭代生成實現長序列預測。此外，由於適配器訓練的設計，我們的方法可以很好地與現有的個性化模型和可控模塊結合。通過定量和定性評估，AtomoVideo相對於流行方法取得了優越的結果，更多範例可在我們的項目網站上找到：https://atomo-video.github.io/。

English

Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training strategies, we achieve greater motion intensity while maintaining superior temporal consistency and stability. Our architecture extends flexibly to the video frame prediction task, enabling long sequence prediction through iterative generation. Furthermore, due to the design of adapter training, our approach can be well combined with existing personalised models and controllable modules. By quantitatively and qualitatively evaluation, AtomoVideo achieves superior results compared to popular methods, more examples can be found on our project website: https://atomo- video.github.io/.

AtomoVideo：高保真度圖像到視頻生成

AtomoVideo: High Fidelity Image-to-Video Generation

摘要

Support