ChatPaper.aiChatPaper

视频帧插值的通用隐式运动建模

Generalizable Implicit Motion Modeling for Video Frame Interpolation

July 11, 2024
作者: Zujin Guo, Wei Li, Chen Change Loy
cs.AI

摘要

在基于流的视频帧插值(VFI)中,运动建模至关重要。现有范式要么考虑双向流的线性组合,要么直接预测给定时间戳的双向流,而不探索有利的运动先验,因此缺乏有效建模现实世界视频中时空动态的能力。为了解决这一局限性,在本研究中,我们引入了通用隐式运动建模(GIMM),这是一种新颖而有效的VFI运动建模方法。具体而言,为了使GIMM成为一种有效的运动建模范式,我们设计了一个运动编码流水线,用于对从预训练流估计器中提取的双向流中的时空运动潜在性进行建模,有效地表示特定输入的运动先验。然后,我们通过自适应基于坐标的神经网络隐式预测两个相邻输入帧之间的任意时间步长的光流,其中时空坐标和运动潜在性作为输入。我们的GIMM可以与现有基于流的VFI工作平滑集成,无需进一步修改。我们展示了GIMM在VFI基准测试中优于当前技术水平的表现。
English
Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks.

Summary

AI-Generated Summary

PDF122November 28, 2024