ChatPaper.aiChatPaper

科幻帧间插值:对称约束下的帧生成

Sci-Fi: Symmetric Constraint for Frame Inbetweening

May 27, 2025
作者: Liuhan Chen, Xiaodong Cun, Xiaoyu Li, Xianyi He, Shenghai Yuan, Jie Chen, Ying Shan, Li Yuan
cs.AI

摘要

帧间插值旨在根据给定的起始帧和结束帧合成中间视频序列。当前最先进的方法主要通过对大规模预训练的图像到视频扩散模型(I2V-DMs)进行扩展,通过直接微调或省略训练来引入结束帧约束。我们发现这些设计存在一个关键局限:它们引入结束帧约束时通常采用与最初施加起始帧(单张图像)约束相同的机制。然而,由于原始I2V-DMs已预先充分训练以适应起始帧条件,若以相同机制引入结束帧约束且训练量大幅减少(甚至为零),很可能无法使结束帧对中间内容产生与起始帧同等强度的影响。这种两帧对中间内容控制力的不对称性,可能导致生成帧中出现运动不一致或外观崩塌的问题。为实现起始帧与结束帧的对称约束,我们提出了一种名为Sci-Fi的新框架,该框架在较小训练规模下应用更强的约束注入。具体而言,它沿用原有方式处理起始帧约束,同时通过改进机制引入结束帧约束。新机制基于一个精心设计的轻量级模块——EF-Net,该模块仅编码结束帧并将其扩展为时间自适应的逐帧特征,注入到I2V-DM中。这使得结束帧约束与起始帧约束同样强大,使我们的Sci-Fi能够在各种场景中生成更加和谐的过渡效果。大量实验证明了Sci-Fi相较于其他基线方法的优越性。
English
Frame inbetweening aims to synthesize intermediate video sequences conditioned on the given start and end frames. Current state-of-the-art methods mainly extend large-scale pre-trained Image-to-Video Diffusion models (I2V-DMs) by incorporating end-frame constraints via directly fine-tuning or omitting training. We identify a critical limitation in their design: Their injections of the end-frame constraint usually utilize the same mechanism that originally imposed the start-frame (single image) constraint. However, since the original I2V-DMs are adequately trained for the start-frame condition in advance, naively introducing the end-frame constraint by the same mechanism with much less (even zero) specialized training probably can't make the end frame have a strong enough impact on the intermediate content like the start frame. This asymmetric control strength of the two frames over the intermediate content likely leads to inconsistent motion or appearance collapse in generated frames. To efficiently achieve symmetric constraints of start and end frames, we propose a novel framework, termed Sci-Fi, which applies a stronger injection for the constraint of a smaller training scale. Specifically, it deals with the start-frame constraint as before, while introducing the end-frame constraint by an improved mechanism. The new mechanism is based on a well-designed lightweight module, named EF-Net, which encodes only the end frame and expands it into temporally adaptive frame-wise features injected into the I2V-DM. This makes the end-frame constraint as strong as the start-frame constraint, enabling our Sci-Fi to produce more harmonious transitions in various scenarios. Extensive experiments prove the superiority of our Sci-Fi compared with other baselines.

Summary

AI-Generated Summary

PDF42May 28, 2025