Sci-Fi: 프레임 인비트위닝을 위한 대칭적 제약

초록

프레임 인비트위닝(Frame inbetweening)은 주어진 시작 프레임과 종료 프레임을 조건으로 중간 비디오 시퀀스를 합성하는 것을 목표로 합니다. 현재 최신 방법들은 주로 대규모로 사전 학습된 이미지-투-비디오 확산 모델(Image-to-Video Diffusion Models, I2V-DMs)을 확장하여, 직접적인 미세 조정(fine-tuning)이나 훈련 생략을 통해 종료 프레임 제약 조건을 통합합니다. 우리는 이러한 설계에서 중요한 한계를 발견했습니다: 종료 프레임 제약 조건의 주입은 일반적으로 원래 시작 프레임(단일 이미지) 제약 조건을 부과했던 동일한 메커니즘을 사용합니다. 그러나 원래의 I2V-DMs는 시작 프레임 조건에 대해 충분히 사전 훈련되어 있기 때문에, 훨씬 적은(심지어 제로) 전문화된 훈련으로 동일한 메커니즘을 통해 종료 프레임 제약 조건을 도입하는 것은 시작 프레임과 같은 강력한 영향을 중간 콘텐츠에 미치지 못할 가능성이 높습니다. 이 두 프레임의 중간 콘텐츠에 대한 비대칭적인 제어 강도는 생성된 프레임에서 일관되지 않은 모션이나 외형 붕괴를 초래할 가능성이 있습니다. 시작 프레임과 종료 프레임의 대칭적인 제약 조건을 효율적으로 달성하기 위해, 우리는 Sci-Fi라는 새로운 프레임워크를 제안합니다. 이 프레임워크는 더 작은 훈련 규모의 제약 조건에 대해 더 강력한 주입을 적용합니다. 구체적으로, 시작 프레임 제약 조건은 기존과 동일하게 처리하면서, 종료 프레임 제약 조건은 개선된 메커니즘을 통해 도입합니다. 이 새로운 메커니즘은 잘 설계된 경량 모듈인 EF-Net을 기반으로 하며, 이 모듈은 종료 프레임만을 인코딩하고 이를 시간적으로 적응 가능한 프레임별 특징으로 확장하여 I2V-DM에 주입합니다. 이를 통해 종료 프레임 제약 조건이 시작 프레임 제약 조건만큼 강력해지며, 우리의 Sci-Fi가 다양한 시나리오에서 더 조화로운 전환을 생성할 수 있게 됩니다. 광범위한 실험을 통해 우리의 Sci-Fi가 다른 베이스라인과 비교하여 우수함을 입증했습니다.

English

Frame inbetweening aims to synthesize intermediate video sequences conditioned on the given start and end frames. Current state-of-the-art methods mainly extend large-scale pre-trained Image-to-Video Diffusion models (I2V-DMs) by incorporating end-frame constraints via directly fine-tuning or omitting training. We identify a critical limitation in their design: Their injections of the end-frame constraint usually utilize the same mechanism that originally imposed the start-frame (single image) constraint. However, since the original I2V-DMs are adequately trained for the start-frame condition in advance, naively introducing the end-frame constraint by the same mechanism with much less (even zero) specialized training probably can't make the end frame have a strong enough impact on the intermediate content like the start frame. This asymmetric control strength of the two frames over the intermediate content likely leads to inconsistent motion or appearance collapse in generated frames. To efficiently achieve symmetric constraints of start and end frames, we propose a novel framework, termed Sci-Fi, which applies a stronger injection for the constraint of a smaller training scale. Specifically, it deals with the start-frame constraint as before, while introducing the end-frame constraint by an improved mechanism. The new mechanism is based on a well-designed lightweight module, named EF-Net, which encodes only the end frame and expands it into temporally adaptive frame-wise features injected into the I2V-DM. This makes the end-frame constraint as strong as the start-frame constraint, enabling our Sci-Fi to produce more harmonious transitions in various scenarios. Extensive experiments prove the superiority of our Sci-Fi compared with other baselines.

Sci-Fi: 프레임 인비트위닝을 위한 대칭적 제약

Sci-Fi: Symmetric Constraint for Frame Inbetweening

초록

Support