AtomoVideo: 고화질 이미지-비디오 생성

초록

최근 비디오 생성 분야는 우수한 텍스트-이미지 생성 기술을 기반으로 상당한 발전을 이루었습니다. 본 연구에서는 이미지-비디오 생성을 위한 고해상도 프레임워크인 AtomoVideo를 제안합니다. 다중 세분화 이미지 주입을 기반으로, 주어진 이미지에 대한 생성된 비디오의 충실도를 높였습니다. 또한, 고품질 데이터셋과 훈련 전략 덕분에 우수한 시간적 일관성과 안정성을 유지하면서도 더 큰 모션 강도를 달성했습니다. 우리의 아키텍처는 비디오 프레임 예측 작업으로 유연하게 확장되어 반복적 생성을 통해 긴 시퀀스 예측이 가능합니다. 더 나아가, 어댑터 훈련 설계 덕분에 기존의 개인화 모델과 제어 가능한 모듈과 잘 결합될 수 있습니다. 정량적 및 정성적 평가를 통해 AtomoVideo는 인기 있는 방법들에 비해 우수한 결과를 달성했으며, 더 많은 예제는 프로젝트 웹사이트(https://atomo-video.github.io/)에서 확인할 수 있습니다.

English

Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training strategies, we achieve greater motion intensity while maintaining superior temporal consistency and stability. Our architecture extends flexibly to the video frame prediction task, enabling long sequence prediction through iterative generation. Furthermore, due to the design of adapter training, our approach can be well combined with existing personalised models and controllable modules. By quantitatively and qualitatively evaluation, AtomoVideo achieves superior results compared to popular methods, more examples can be found on our project website: https://atomo- video.github.io/.

AtomoVideo: 고화질 이미지-비디오 생성

AtomoVideo: High Fidelity Image-to-Video Generation

초록

Support