生成中間画像：キーフレーム補間のための画像からビデオモデルの適応

要旨

我々は、一対の入力キーフレーム間で連続した動きを持つビデオシーケンスを生成する方法を提案します。我々は、大規模な事前学習済みの画像からビデオへの拡散モデル（元々単一の入力画像から時間の経過と共に前進するビデオを生成するために訓練された）を、キーフレーム補間、つまり2つの入力フレームの間にビデオを生成するために適応させます。我々は、この適応を、軽量なファインチューニング技術を用いて達成し、単一の入力画像から時間を逆戻りするビデオを予測するモデルのバージョンを生成します。このモデル（元の前進モデルと共に）は、その後、2つのキーフレームのそれぞれから始まる重なり合うモデル推定を組み合わせる双方向の拡散サンプリングプロセスで使用されます。我々の実験では、当該手法が既存の拡散ベースの手法や従来のフレーム補間技術を上回ることを示しています。

English

We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.

生成中間画像：キーフレーム補間のための画像からビデオモデルの適応

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

要旨

Support