影片幀插補的通用隱式運動建模

摘要

在基於流的影片幀插補（VFI）中，運動建模至關重要。現有範式要麼考慮雙向流的線性組合，要麼直接預測給定時間戳的雙向流，而沒有探索有利的運動先驗，因此缺乏有效建模現實世界影片中時空動態的能力。為了解決這一限制，在本研究中，我們引入了通用隱式運動建模（GIMM），這是一種針對VFI的運動建模的新穎有效方法。具體來說，為了將GIMM作為一種有效的運動建模範式，我們設計了一個運動編碼管道，用於建模從預先訓練的流估計器中提取的雙向流中的時空運動潛在性，有效地表示特定於輸入的運動先驗。然後，我們通過一個自適應基於坐標的神經網絡，使用時空坐標和運動潛在作為輸入，隱式預測兩個相鄰輸入幀之間的任意時間步長的光流。我們的GIMM可以與現有基於流的VFI工作平滑集成，無需進行進一步修改。我們展示了GIMM在VFI基準測試中優於當前技術水平。

English

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks.

影片幀插補的通用隱式運動建模

Generalizable Implicit Motion Modeling for Video Frame Interpolation

摘要

Support