SEINE：短至长视频扩散模型，用于生成过渡和预测

摘要

最近，视频生成在产生逼真结果方面取得了实质性进展。然而，现有的人工智能生成视频通常是非常短的片段（“镜头级”），描绘单个场景。为了呈现连贯的长视频（“故事级”），希望能够在不同片段之间实现创意过渡和预测效果。本文提出了一种短到长视频扩散模型SEINE，专注于生成过渡和预测。其目标是生成质量高且具有流畅且创意的场景过渡以及不同长度的镜头级视频的长视频。具体来说，我们提出了一种基于随机掩码的视频扩散模型，可根据文本描述自动生成过渡。通过提供不同场景的图像作为输入，并结合基于文本的控制，我们的模型生成确保连贯性和视觉质量的过渡视频。此外，该模型可以轻松扩展到各种任务，如图像到视频动画和自回归视频预测。为了对这一新的生成任务进行全面评估，我们提出了三个评估标准以评估流畅和创意的过渡：时间一致性、语义相似性和视频-文本语义对齐。大量实验证实了我们的方法相对于现有的生成过渡和预测方法的有效性，实现了故事级长视频的创作。项目页面：https://vchitect.github.io/SEINE-project/。

English

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos. Project page: https://vchitect.github.io/SEINE-project/ .

SEINE：短至长视频扩散模型，用于生成过渡和预测

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

摘要

Support