AtomoVideo：高保真图像到视频生成

摘要

最近，基于优秀的文本到图像生成技术，视频生成取得了显著的快速发展。在这项工作中，我们提出了一个名为AtomoVideo的图像到视频生成的高保真框架。通过多粒度图像注入，我们实现了生成视频与给定图像更高的保真度。此外，由于高质量的数据集和训练策略，我们在保持优越的时间一致性和稳定性的同时实现了更大的运动强度。我们的架构灵活地扩展到视频帧预测任务，通过迭代生成实现长序列预测。此外，由于适配器训练的设计，我们的方法可以很好地与现有的个性化模型和可控模块结合。通过定量和定性评估，AtomoVideo相对于流行方法取得了优越的结果，更多示例可在我们的项目网站上找到：https://atomo-video.github.io/。

English

Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training strategies, we achieve greater motion intensity while maintaining superior temporal consistency and stability. Our architecture extends flexibly to the video frame prediction task, enabling long sequence prediction through iterative generation. Furthermore, due to the design of adapter training, our approach can be well combined with existing personalised models and controllable modules. By quantitatively and qualitatively evaluation, AtomoVideo achieves superior results compared to popular methods, more examples can be found on our project website: https://atomo- video.github.io/.

AtomoVideo：高保真图像到视频生成

AtomoVideo: High Fidelity Image-to-Video Generation

摘要

Support