生成中間幀:調整影像到視頻模型以進行關鍵幀插值
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
August 27, 2024
作者: Xiaojuan Wang, Boyang Zhou, Brian Curless, Ira Kemelmacher-Shlizerman, Aleksander Holynski, Steven M. Seitz
cs.AI
摘要
我們提出了一種方法,用於生成具有連貫運動的影片序列,這些影片序列位於一對輸入關鍵幀之間。我們適應了一個預訓練的大規模圖像到影片擴散模型(最初訓練用於從單個輸入圖像生成向前移動的影片)來進行關鍵幀插值,即在兩個輸入幀之間生成影片。我們通過一種輕量級微調技術來實現這種適應,該技術生成了一個模型的版本,該模型預測從單個輸入圖像向後移動的影片。這個模型(以及原始的向前運動模型)隨後用於雙向擴散採樣過程,該過程結合了從兩個關鍵幀開始的重疊模型估計。我們的實驗表明,我們的方法優於現有基於擴散的方法和傳統的幀插值技術。
English
We present a method for generating video sequences with coherent motion
between a pair of input key frames. We adapt a pretrained large-scale
image-to-video diffusion model (originally trained to generate videos moving
forward in time from a single input image) for key frame interpolation, i.e.,
to produce a video in between two input frames. We accomplish this adaptation
through a lightweight fine-tuning technique that produces a version of the
model that instead predicts videos moving backwards in time from a single input
image. This model (along with the original forward-moving model) is
subsequently used in a dual-directional diffusion sampling process that
combines the overlapping model estimates starting from each of the two
keyframes. Our experiments show that our method outperforms both existing
diffusion-based methods and traditional frame interpolation techniques.Summary
AI-Generated Summary