Sci-Fi: フレーム補間のための対称制約

要旨

フレーム補間は、与えられた開始フレームと終了フレームに基づいて中間的なビデオシーケンスを合成することを目的としています。現在の最先端の手法は、主に大規模な事前学習済み画像-動画拡散モデル（I2V-DM）を拡張し、終了フレームの制約を直接ファインチューニングやトレーニングの省略によって組み込んでいます。しかし、これらの設計には重要な限界があります。終了フレームの制約の注入は、通常、開始フレーム（単一画像）の制約を課すために元々使用されていたのと同じメカニズムを利用しています。しかし、元のI2V-DMは事前に開始フレームの条件に対して十分にトレーニングされているため、同じメカニズムで終了フレームの制約を導入しても、開始フレームのように中間コンテンツに十分な影響を与えることはできません。この2つのフレームの中間コンテンツに対する非対称的な制御力は、生成されたフレームにおいて一貫性のない動きや外観の崩れを引き起こす可能性があります。開始フレームと終了フレームの対称的な制約を効率的に達成するために、我々は新しいフレームワーク「Sci-Fi」を提案します。このフレームワークは、より小規模なトレーニングスケールの制約に対してより強力な注入を適用します。具体的には、開始フレームの制約は従来通り扱い、終了フレームの制約は改良されたメカニズムによって導入します。この新しいメカニズムは、終了フレームのみをエンコードし、それを時間的に適応的なフレームごとの特徴に拡張してI2V-DMに注入する、よく設計された軽量モジュール「EF-Net」に基づいています。これにより、終了フレームの制約が開始フレームの制約と同じくらい強力になり、我々のSci-Fiはさまざまなシナリオでより調和のとれた遷移を生成することが可能になります。広範な実験により、Sci-Fiが他のベースラインと比較して優れていることが証明されています。

English

Frame inbetweening aims to synthesize intermediate video sequences conditioned on the given start and end frames. Current state-of-the-art methods mainly extend large-scale pre-trained Image-to-Video Diffusion models (I2V-DMs) by incorporating end-frame constraints via directly fine-tuning or omitting training. We identify a critical limitation in their design: Their injections of the end-frame constraint usually utilize the same mechanism that originally imposed the start-frame (single image) constraint. However, since the original I2V-DMs are adequately trained for the start-frame condition in advance, naively introducing the end-frame constraint by the same mechanism with much less (even zero) specialized training probably can't make the end frame have a strong enough impact on the intermediate content like the start frame. This asymmetric control strength of the two frames over the intermediate content likely leads to inconsistent motion or appearance collapse in generated frames. To efficiently achieve symmetric constraints of start and end frames, we propose a novel framework, termed Sci-Fi, which applies a stronger injection for the constraint of a smaller training scale. Specifically, it deals with the start-frame constraint as before, while introducing the end-frame constraint by an improved mechanism. The new mechanism is based on a well-designed lightweight module, named EF-Net, which encodes only the end frame and expands it into temporally adaptive frame-wise features injected into the I2V-DM. This makes the end-frame constraint as strong as the start-frame constraint, enabling our Sci-Fi to produce more harmonious transitions in various scenarios. Extensive experiments prove the superiority of our Sci-Fi compared with other baselines.

Sci-Fi: フレーム補間のための対称制約

Sci-Fi: Symmetric Constraint for Frame Inbetweening

要旨

Support